Перед вами стоит задача – подготовить аналитический отчет для HR-отдела. На основании проведенной аналитики предполагается составить рекомендации для отдела кадров по стратегии набора персонала, а также по взаимодействию с уже имеющимися сотрудниками.
В базе данных лежит набор таблиц, которые содержат данные о сотрудниках вымышленной компании.
Сделайте обзор штата сотрудников компании. Составьте набор предметов исследования, а затем проверьте их на данных. Вся аналитика должна быть выполена с помощью SQL. Впоследствии данные можно визуализировать, однако финальные датафреймы для графиков также должны быть подготовлены с помощью SQL.
Примеры гипотез:
perfomance score и тем, под чьим руководством работает сотрудник.
Параметры для подключения следующие: хост – dsstudents.skillbox.ru, порт – 5432, имя базы данных – human_resources, пользователь – readonly, пароль – 6hajV34RTQfmxhS. Таблицы, доступные для анализа, – hr_dataset, production_staff, recruiting_costs, salary_grid.
- По непонятной мне причине специалисты в области анализа данных считают себя вправе давать рекомендации в областях деятельности, в которых они некомпетентны и в которых они не обладают должным уровнем знаний, умений и опыта, эрудиции и кругозора, необходимых для интерпретации полученных в ходе анализа выводов.
- Моё глубокое убеждение: аналитик данных не имеет права давать рекомендации за пределами своей компетенции, особенно если эти рекомендации относятся к уровню управленческих решений: стратегиям, политикам, организационным действиям. В противном случае рекомендации будут наивны, непрофессиональны и в большинстве случаев вредны. Зачастую они будут построены на догадках и дилетанском подходе к области исследования, они также не будут учитывать весь объём информации, необходимый для принятия ппрофессионального управленческого решения. Эта информация, как правило, для анализа заказчиком не предоставляется. Даже, если заказчик просит предоставить ему рекомендации на основе аналитического исследования данных, необходима обязательная оговорка, чётко ограничивающая пределы компетенции аналитика.
- Не смотря на то, что в силу своего профессионального опыта я не являюсь дилетантом в области управления и в HR, я воздержусь от дословного выполнения части задания: "На основании проведенной аналитики предполагается составить рекомендации для отдела кадров по стратегии набора персонала, а также по взаимодействию с уже имеющимися сотрудниками". Я планирую ограничиться в своих выводах перечислением пунктов, на которые я предлагаю обратить особое внимание HR менеджера, так как они представляются мне важными, выбивающимися из общей логики и системы, аномальными, могущими свидетельствовать о негативных процессах. Именно "могущими свидетельствовать", не "свидетельствующими". Представленных данных недостаточно для категоричных выводов.
import sqlalchemy # v. 1.4.27
import psycopg2 # v. 2.8.6
import numpy as np # v. 1.21.2
import pandas as pd # v. 1.3.4
import matplotlib as mpl # v 3.5.0
import matplotlib.pyplot as plt # v 3.5.0
import seaborn as sns # v. 0.11.2
import datetime as dt
# Создадим строку параметров для подключения к базе данных
conn = 'postgresql+psycopg2://readonly:6hajV34RTQfmxhS@dsstudents.skillbox.ru:5432/human_resources'
# Создадим интерфейс взаимодейтсвия с базой данных
engine = sqlalchemy.create_engine(conn)
conn = engine.connect()
inspector = sqlalchemy.inspect(engine)
# Определим список таблиц, имеющихся в базе
table_list = inspector.get_table_names()
table_list
['hr_dataset', 'production_staff', 'recruiting_costs', 'salary_grid']
# Запишем эти таблицы в .csv файлы для последующего импорта в mysql (для отладки)
for q_table in table_list: # Проходим по списку таблиц, q_table - запрашиваемая таблица
sql_quiery = f"select * from {q_table}" # Cоздаём запрос q для вывода всех знасчений из таблицы в списке
df = pd.read_sql(sql_quiery, conn, index_col='id') # Считываем содержимое таблицы в df
df.to_csv(q_table + ".csv") # Записываем df в файл csv
print(f"{q_table} processed, ok") # Выводим на консоль контрольную информацию
hr_dataset processed, ok production_staff processed, ok recruiting_costs processed, ok salary_grid processed, ok
Для удобства отображения данных изменим опции максимум отображаемых строк и столбцов.
pd.set_option('display.max_rows', 400) # изменим максимум отображаемых строк
pd.set_option('display.max_columns', 50) # изменим максимум отображаемых столбцов
Определим стиль графиков Seaborn
sns.set_style("darkgrid")
Выведем информацию о датафреймах и начальные строки таблиц
sql_quiery = f"select * from hr_dataset"
df_hr_dataset=pd.read_sql(sql_quiery, conn, index_col='id')
df_hr_dataset.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 310 entries, 1 to 310 Data columns (total 28 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Employee Name 310 non-null object 1 Employee Number 310 non-null int64 2 marriedid 310 non-null int64 3 maritalstatusid 310 non-null int64 4 genderid 310 non-null int64 5 empstatus_id 310 non-null int64 6 deptid 310 non-null int64 7 perf_scoreid 310 non-null int64 8 age 310 non-null int64 9 Pay Rate 310 non-null float64 10 state 310 non-null object 11 zip 310 non-null int64 12 dob 310 non-null object 13 sex 310 non-null object 14 maritaldesc 310 non-null object 15 citizendesc 310 non-null object 16 Hispanic/Latino 310 non-null object 17 racedesc 310 non-null object 18 Date of Hire 310 non-null object 19 Days Employed 310 non-null int64 20 Date of Termination 103 non-null object 21 Reason For Term 310 non-null object 22 Employment Status 310 non-null object 23 department 310 non-null object 24 position 310 non-null object 25 Manager Name 310 non-null object 26 Employee Source 310 non-null object 27 Performance Score 310 non-null object dtypes: float64(1), int64(10), object(17) memory usage: 70.2+ KB
df_hr_dataset.head(10)
| Employee Name | Employee Number | marriedid | maritalstatusid | genderid | empstatus_id | deptid | perf_scoreid | age | Pay Rate | state | zip | dob | sex | maritaldesc | citizendesc | Hispanic/Latino | racedesc | Date of Hire | Days Employed | Date of Termination | Reason For Term | Employment Status | department | position | Manager Name | Employee Source | Performance Score | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| id | ||||||||||||||||||||||||||||
| 1 | Brown, Mia | 1103024456 | 1 | 1 | 0 | 1 | 1 | 3 | 30 | 28.50 | MA | 1450 | 1987-11-24 | Female | Married | US Citizen | No | Black or African American | 2008-10-27 | 3317 | None | N/A - still employed | Active | Admin Offices | Accountant I | Brandon R. LeBlanc | Diversity Job Fair | Fully Meets |
| 2 | LaRotonda, William | 1106026572 | 0 | 2 | 1 | 1 | 1 | 3 | 34 | 23.00 | MA | 1460 | 1984-04-26 | Male | Divorced | US Citizen | No | Black or African American | 2014-01-06 | 1420 | None | N/A - still employed | Active | Admin Offices | Accountant I | Brandon R. LeBlanc | Website Banner Ads | Fully Meets |
| 3 | Steans, Tyrone | 1302053333 | 0 | 0 | 1 | 1 | 1 | 3 | 31 | 29.00 | MA | 2703 | 1986-09-01 | Male | Single | US Citizen | No | White | 2014-09-29 | 1154 | None | N/A - still employed | Active | Admin Offices | Accountant I | Brandon R. LeBlanc | Internet Search | Fully Meets |
| 4 | Howard, Estelle | 1211050782 | 1 | 1 | 0 | 1 | 1 | 9 | 32 | 21.50 | MA | 2170 | 1985-09-16 | Female | Married | US Citizen | No | White | 2015-02-16 | 58 | 2015-04-15 | N/A - still employed | Active | Admin Offices | Administrative Assistant | Brandon R. LeBlanc | Pay Per Click - Google | N/A- too early to review |
| 5 | Singh, Nan | 1307059817 | 0 | 0 | 0 | 1 | 1 | 9 | 30 | 16.56 | MA | 2330 | 1988-05-19 | Female | Single | US Citizen | No | White | 2015-05-01 | 940 | None | N/A - still employed | Active | Admin Offices | Administrative Assistant | Brandon R. LeBlanc | Website Banner Ads | N/A- too early to review |
| 6 | Smith, Leigh Ann | 711007713 | 1 | 1 | 0 | 5 | 1 | 3 | 30 | 20.50 | MA | 1844 | 1987-06-14 | Female | Married | US Citizen | No | Asian | 2011-09-26 | 730 | 2013-09-25 | career change | Voluntarily Terminated | Admin Offices | Administrative Assistant | Brandon R. LeBlanc | Diversity Job Fair | Fully Meets |
| 7 | LeBlanc, Brandon R | 1102024115 | 1 | 1 | 1 | 1 | 1 | 3 | 33 | 55.00 | MA | 1460 | 1984-06-10 | Male | Married | US Citizen | No | White | 2016-01-05 | 691 | None | N/A - still employed | Active | Admin Offices | Shared Services Manager | Janet King | Monster.com | Fully Meets |
| 8 | Quinn, Sean | 1206043417 | 1 | 1 | 1 | 5 | 1 | 3 | 33 | 55.00 | MA | 2045 | 1984-11-06 | Male | Married | Eligible NonCitizen | No | Black or African American | 2011-02-21 | 1636 | 2015-08-15 | career change | Voluntarily Terminated | Admin Offices | Shared Services Manager | Janet King | Diversity Job Fair | Fully Meets |
| 9 | Boutwell, Bonalyn | 1307060188 | 1 | 1 | 0 | 1 | 1 | 0 | 31 | 34.95 | MA | 2468 | 1987-04-04 | Female | Married | US Citizen | No | Asian | 2015-02-16 | 1014 | None | N/A - still employed | Active | Admin Offices | Sr. Accountant | Brandon R. LeBlanc | Diversity Job Fair | 90-day meets |
| 10 | Foster-Baker, Amy | 1201031308 | 1 | 1 | 0 | 1 | 1 | 3 | 39 | 34.95 | MA | 2050 | 1979-04-16 | Female | Married | US Citizen | no | White | 2009-01-05 | 3247 | None | N/A - still employed | Active | Admin Offices | Sr. Accountant | Board of Directors | Other | Fully Meets |
Вычислим, на какую дату составлена таблица.
sql_quiery = \
"""
SELECT
"Date of Hire",
"Days Employed",
("Date of Hire" + "Days Employed") AS actual_date
FROM
hr_dataset
WHERE
"Date of Termination" is null
;"""
df_actual_date = pd.read_sql(sql_quiery, conn)
df_actual_date.head()
| Date of Hire | Days Employed | actual_date | |
|---|---|---|---|
| 0 | 2008-10-27 | 3317 | 2017-11-26 |
| 1 | 2014-01-06 | 1420 | 2017-11-26 |
| 2 | 2014-09-29 | 1154 | 2017-11-26 |
| 3 | 2015-05-01 | 940 | 2017-11-26 |
| 4 | 2016-01-05 | 691 | 2017-11-26 |
sql_quiery = f"select * from production_staff"
df_production_staff = pd.read_sql(sql_quiery, conn, index_col='id')
df_production_staff.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 256 entries, 1 to 256 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Employee Name 209 non-null object 1 Race Desc 209 non-null object 2 Date of Hire 209 non-null object 3 TermDate 83 non-null object 4 Reason for Term 209 non-null object 5 Employment Status 209 non-null object 6 Department 209 non-null object 7 Position 209 non-null object 8 Pay 209 non-null object 9 Manager Name 209 non-null object 10 Performance Score 209 non-null object 11 Abutments/Hour Wk 1 208 non-null float64 12 Abutments/Hour Wk 2 208 non-null float64 13 Daily Error Rate 208 non-null float64 14 90-day Complaints 208 non-null float64 dtypes: float64(4), object(11) memory usage: 32.0+ KB
df_production_staff.head(10)
| Employee Name | Race Desc | Date of Hire | TermDate | Reason for Term | Employment Status | Department | Position | Pay | Manager Name | Performance Score | Abutments/Hour Wk 1 | Abutments/Hour Wk 2 | Daily Error Rate | 90-day Complaints | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| id | |||||||||||||||
| 1 | Albert, Michael | White | 2011-08-01 | None | N/A - still employed | Active | Production | Production Manager | $54.50 | Elisa Bramante | Fully Meets | 0.0 | 0.0 | 0.0 | 0.0 |
| 2 | Bozzi, Charles | Asian | 2013-09-30 | 2014-08-07 | retiring | Voluntarily Terminated | Production | Production Manager | $50.50 | Elisa Bramante | Fully Meets | 0.0 | 0.0 | 0.0 | 0.0 |
| 3 | Butler, Webster L | White | 2016-01-28 | None | N/A - still employed | Active | Production | Production Manager | $55.00 | Elisa Bramante | Exceeds | 0.0 | 0.0 | 0.0 | 0.0 |
| 4 | Dunn, Amy | White | 2014-09-18 | None | N/A - still employed | Active | Production | Production Manager | $51.00 | Elisa Bramante | Fully Meets | 0.0 | 0.0 | 0.0 | 0.0 |
| 5 | Gray, Elijiah | White | 2015-06-02 | None | N/A - still employed | Active | Production | Production Manager | $54.00 | Elisa Bramante | Fully Meets | 0.0 | 0.0 | 0.0 | 0.0 |
| 6 | Hogland, Jonathan | White | 2011-01-10 | 2015-12-12 | attendance | Terminated for Cause | Production | Production Manager | $48.00 | Elisa Bramante | Fully Meets | 0.0 | 0.0 | 0.0 | 0.0 |
| 7 | Immediato, Walter | Asian | 2011-02-21 | 2012-09-24 | unhappy | Voluntarily Terminated | Production | Production Manager | $42.00 | Elisa Bramante | Needs Improvement | 0.0 | 0.0 | 0.0 | 0.0 |
| 8 | Liebig, Ketsia | White | 2013-09-30 | None | N/A - still employed | Active | Production | Production Manager | $55.00 | Elisa Bramante | Exceeds | 0.0 | 0.0 | 0.0 | 0.0 |
| 9 | Miller, Brannon | Hispanic | 2012-08-16 | None | N/A - still employed | Active | Production | Production Manager | $53.00 | Elisa Bramante | Fully Meets | 0.0 | 0.0 | 0.0 | 0.0 |
| 10 | Peterson, Ebonee | White | 2010-10-25 | 2016-05-18 | Another position | Voluntarily Terminated | Production | Production Manager | $38.00 | Elisa Bramante | Fully Meets | 0.0 | 0.0 | 0.0 | 0.0 |
sql_quiery = f"select * from recruiting_costs"
df_recruiting_costs =pd.read_sql(sql_quiery, conn, index_col='id')
df_recruiting_costs.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 22 entries, 1 to 22 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Employment Source 22 non-null object 1 January 22 non-null int64 2 February 22 non-null int64 3 March 22 non-null int64 4 April 22 non-null int64 5 May 22 non-null int64 6 June 22 non-null int64 7 July 22 non-null int64 8 August 22 non-null int64 9 September 22 non-null int64 10 October 22 non-null int64 11 November 22 non-null int64 12 December 22 non-null int64 13 Total 22 non-null int64 dtypes: int64(13), object(1) memory usage: 2.6+ KB
df_recruiting_costs.head(25)
| Employment Source | January | February | March | April | May | June | July | August | September | October | November | December | Total | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| id | ||||||||||||||
| 1 | Billboard | 520 | 520 | 520 | 520 | 0 | 0 | 612 | 612 | 729 | 749 | 910 | 500 | 6192 |
| 2 | Careerbuilder | 410 | 410 | 410 | 820 | 820 | 410 | 410 | 820 | 820 | 1230 | 820 | 410 | 7790 |
| 3 | Company Intranet - Partner | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | Diversity Job Fair | 0 | 5129 | 0 | 0 | 0 | 0 | 0 | 4892 | 0 | 0 | 0 | 0 | 10021 |
| 5 | Employee Referral | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 6 | Glassdoor | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 7 | Information Session | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 8 | Internet Search | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 9 | MBTA ads | 640 | 640 | 640 | 640 | 640 | 640 | 640 | 1300 | 1300 | 1300 | 1300 | 1300 | 10980 |
| 10 | Monster.com | 500 | 500 | 500 | 440 | 500 | 500 | 440 | 500 | 440 | 440 | 500 | 500 | 5760 |
| 11 | Newspager/Magazine | 629 | 510 | 293 | 810 | 642 | 675 | 707 | 740 | 772 | 805 | 838 | 870 | 8291 |
| 12 | On-campus Recruiting | 0 | 0 | 2500 | 0 | 0 | 2500 | 0 | 0 | 2500 | 0 | 0 | 0 | 7500 |
| 13 | On-line Web application | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 14 | Other | 0 | 492 | 0 | 829 | 744 | 0 | 610 | 0 | 0 | 510 | 0 | 810 | 3995 |
| 15 | Pay Per Click | 110 | 110 | 60 | 121 | 110 | 109 | 130 | 146 | 105 | 109 | 105 | 110 | 1323 |
| 16 | Pay Per Click - Google | 330 | 330 | 180 | 362 | 197 | 152 | 389 | 437 | 315 | 327 | 315 | 176 | 3509 |
| 17 | Professional Society | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 1200 |
| 18 | Search Engine - Google Bing Yahoo | 330 | 410 | 388 | 372 | 472 | 412 | 416 | 495 | 619 | 502 | 389 | 378 | 5183 |
| 19 | Social Networks - Facebook Twitter etc | 420 | 481 | 452 | 479 | 392 | 508 | 578 | 466 | 389 | 439 | 491 | 478 | 5573 |
| 20 | Vendor Referral | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 21 | Website Banner Ads | 400 | 400 | 300 | 388 | 592 | 610 | 620 | 669 | 718 | 767 | 816 | 865 | 7143 |
| 22 | Word of Mouth | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
sql_quiery = f"select * from salary_grid"
df_salary_grid =pd.read_sql(sql_quiery, conn, index_col='id')
df_salary_grid.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 12 entries, 1 to 12 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Position 12 non-null object 1 Salary Min 12 non-null int64 2 Salary Mid 12 non-null int64 3 Salary Max 12 non-null int64 4 Hourly Min 12 non-null float64 5 Hourly Mid 12 non-null float64 6 Hourly Max 12 non-null float64 dtypes: float64(3), int64(3), object(1) memory usage: 768.0+ bytes
df_salary_grid.head(12)
| Position | Salary Min | Salary Mid | Salary Max | Hourly Min | Hourly Mid | Hourly Max | |
|---|---|---|---|---|---|---|---|
| id | |||||||
| 1 | Administrative Assistant | 30000 | 40000 | 50000 | 14.42 | 19.23 | 24.04 |
| 2 | Sr. Administrative Assistant | 35000 | 45000 | 55000 | 16.83 | 21.63 | 26.44 |
| 3 | Accountant I | 42274 | 51425 | 62299 | 20.32 | 24.72 | 29.95 |
| 4 | Accountant II | 50490 | 62158 | 74658 | 24.27 | 29.88 | 35.89 |
| 5 | Sr. Accountant | 63264 | 76988 | 92454 | 30.42 | 37.01 | 44.45 |
| 6 | Network Engineer | 50845 | 66850 | 88279 | 24.44 | 32.14 | 42.44 |
| 7 | Sr. Network Engineer | 79428 | 99458 | 120451 | 38.19 | 47.82 | 57.91 |
| 8 | Database Administrator | 50569 | 68306 | 93312 | 24.31 | 32.84 | 44.86 |
| 9 | Sr. DBA | 92863 | 116007 | 139170 | 44.65 | 55.77 | 66.91 |
| 10 | Production Technician I | 30000 | 40000 | 50000 | 14.42 | 19.23 | 24.04 |
| 11 | Production Technician II | 38000 | 48000 | 58000 | 18.27 | 23.08 | 27.88 |
| 12 | Lead Production Technician | 45000 | 55000 | 65000 | 21.63 | 26.44 | 31.25 |
ВЫВОД
Первый взгляд на содержание и структуру таблиц в базе позволяет заключить следующее.
(Примеры "нестыковок" приведены ниже).
print("Например:")
print("Должности, которые есть в salary_grid, но которых нет в hr_dataset: ", "\n",
(set(df_salary_grid.Position.tolist()) - set(df_hr_dataset.position.tolist())), "\n")
print("Должности, которые есть в hr_dataset, но которых нет в salary_grid: ", "\n",
(set(df_hr_dataset.position.tolist()) - set(df_salary_grid["Position"].tolist())), "\n")
print("Имена менеджеров, которых из-за расхождения в написании нет в поле сотрудников в hr_dataset: ", "\n",
(set(df_hr_dataset["Manager Name"].tolist()) - set(df_hr_dataset["Employee Name"].tolist())), "\n")
Например:
Должности, которые есть в salary_grid, но которых нет в hr_dataset:
{'Sr. Administrative Assistant', 'Lead Production Technician', 'Accountant II'}
Должности, которые есть в hr_dataset, но которых нет в salary_grid:
{'Production Manager', 'President & CEO', 'IT Manager - Infra', 'Data Architect', 'IT Support', 'BI Developer', 'Area Sales Manager', 'Software Engineer', 'Sales Manager', 'BI Director', 'IT Manager - Support', 'Director of Operations', 'Senior BI Developer', 'CIO', 'Director of Sales', 'Software Engineering Manager', 'IT Manager - DB', 'Shared Services Manager', 'IT Director'}
Имена менеджеров, которых из-за расхождения в написании нет в поле сотрудников в hr_dataset:
{'Board of Directors', 'Debra Houlihan', 'Michael Albert', 'Brian Champaigne', 'Janet King', 'Eric Dougall', 'Webster Butler', 'Alex Sweetwater', 'Kissy Sullivan', 'Brandon R. LeBlanc', 'Ketsia Liebig', 'Elijiah Gray', 'Amy Dunn', 'Brannon Miller', 'Lynn Daneault', 'David Stanley', 'Peter Monroe', 'Kelley Spirea', 'Simon Roup', 'Jennifer Zamora', 'John Smith'}
Итак, мы имеем 4 таблицы:
Из общего реестра можно также выбрать персонал по подразделениям. Однако, в общем реестре нет данных о KPI, которые имеются в реестре персонала производственного подразделения, поэтому между подразделениями KPI сопоставить невозможно. Хотя, не исключено, что они специфичны для каждого подразделения.
Основная таблица для анализа - это hr_dataset. В ней есть данные обо всех сотрудниках, как работающих сейчас, так и работавших ранее, и приступающих к работе в ближайшем будущем.
Для анализа действующего штата компании необходимо выяснить, какие работники в настоящее время числятся в штате, а какие уже уволены. Для этого будут выведены статусы занятости и соответствующее им число сотрудников.
Обзор штата компании организуем по следующим разделам и группам показателей.
# Выберем статусы занятости из таблицы hr_dataset и подсчитаем количество работников для них
sql_quiery = \
"""
(SELECT
"Employment Status",
COUNT("Employee Number") AS "Employee Count"
FROM
hr_dataset
GROUP BY
"Employment Status"
ORDER BY
"Employment Status"
)
UNION ALL
(SELECT
'TOTALS',
COUNT("Employee Number") AS "Employee Count"
FROM
hr_dataset
)
;
"""
df_empstatus = pd.read_sql(sql_quiery, conn)
df_empstatus
| Employment Status | Employee Count | |
|---|---|---|
| 0 | Active | 183 |
| 1 | Future Start | 11 |
| 2 | Leave of Absence | 14 |
| 3 | Terminated for Cause | 14 |
| 4 | Voluntarily Terminated | 88 |
| 5 | TOTALS | 310 |
# Построим круговую диаграммы
# Определим данные для построения
dfg_empstatus = df_empstatus[:-1]
# Определим поле для диаграмм
fig, ax = plt.subplots(figsize=(16, 6), ncols=1, nrows=1)
fig.suptitle("Распределение статусов занятости", fontsize = 16, y=0.9)
ax.pie(dfg_empstatus["Employee Count"],
labels = dfg_empstatus["Employment Status"],
autopct='%1.1f%%',
textprops={'fontsize':12},
pctdistance=0.8,
explode=(0, 0, 0, 0.1, 0.1)
)
plt.show()
ВЫВОД
Ниже при анализе параметров будем прежде всего рассматривать действующих сотрудников и действующий штат на настоящее время.
# Выберем подразделения из таблицы hr_dataset
sql_quiery = \
"""
(SELECT
COALESCE(department, '[TOTAL]') AS "Department",
COUNT("Employee Number") AS Number_of_employees
FROM
hr_dataset
WHERE
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
GROUP BY ROLLUP
(department)
ORDER BY
department
)
;
"""
df_departments = pd.read_sql(sql_quiery, conn)
df_departments
| Department | number_of_employees | |
|---|---|---|
| 0 | Admin Offices | 8 |
| 1 | Executive Office | 1 |
| 2 | IT/IS | 40 |
| 3 | Production | 125 |
| 4 | Sales | 27 |
| 5 | Software Engineering | 7 |
| 6 | [TOTAL] | 208 |
# Построим круговую диаграммы
# Определим данные для построения
dfg_departments = df_departments[:-1]
# Определим поле для диаграмм
fig, ax = plt.subplots(figsize=(16, 7), ncols=1, nrows=1)
fig.suptitle("Распределение действующих сотрудников по департаментам", fontsize = 16, y=0.825)
cmap = plt.colormaps["Set2"] # Выберем цветовую палитру
my_colors = cmap(np.arange(6)*1) # Определим цвета для графика np.arrange(число сегментов)*коэффициент
ax.pie(dfg_departments["number_of_employees"],
labels = dfg_departments["Department"],
autopct='%1.1f%%',
textprops={'fontsize':12},
pctdistance=0.8,
radius=0.8,
colors=my_colors,
explode=(0, 0, 0.2, 0, 0,0)
)
plt.show()
ВЫВОД
# Выберем должности и их распределение по департаментам для действующих сотрудников из таблицы hr_dataset
sql_quiery = \
"""
SELECT
COALESCE(department, '[TOTAL]') AS Department,
COALESCE(position, '[SUBTOTAL]') AS positions,
COUNT("Employee Number") AS "Employee Count",
ROUND(
-- Приведём полученные значения одсчёта "Employee Number" к числовому значения с точностью 9 знаков и
-- 2 знаками после запятой
CAST(
COUNT("Employee Number")
AS numeric(9, 2)) /
-- Результат выведем так же с 2 знаками после запятой
(SELECT COUNT("Employee Number") FROM hr_dataset
WHERE
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence') * 100, 2)
AS percent_of_all
FROM
hr_dataset
WHERE
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
GROUP BY ROLLUP
(
department,
position
)
ORDER BY
department,
position,
"Employee Count"
;"""
df_positions = pd.read_sql(sql_quiery, conn, index_col=['department', 'positions'])
df_positions
| Employee Count | percent_of_all | ||
|---|---|---|---|
| department | positions | ||
| Admin Offices | Accountant I | 3 | 1.44 |
| Administrative Assistant | 2 | 0.96 | |
| Shared Services Manager | 1 | 0.48 | |
| Sr. Accountant | 2 | 0.96 | |
| [SUBTOTAL] | 8 | 3.85 | |
| Executive Office | President & CEO | 1 | 0.48 |
| [SUBTOTAL] | 1 | 0.48 | |
| IT/IS | BI Developer | 4 | 1.92 |
| BI Director | 1 | 0.48 | |
| CIO | 1 | 0.48 | |
| Data Architect | 1 | 0.48 | |
| Database Administrator | 8 | 3.85 | |
| IT Director | 1 | 0.48 | |
| IT Manager - DB | 1 | 0.48 | |
| IT Manager - Infra | 1 | 0.48 | |
| IT Manager - Support | 1 | 0.48 | |
| IT Support | 4 | 1.92 | |
| Network Engineer | 8 | 3.85 | |
| Senior BI Developer | 3 | 1.44 | |
| Sr. DBA | 1 | 0.48 | |
| Sr. Network Engineer | 5 | 2.40 | |
| [SUBTOTAL] | 40 | 19.23 | |
| Production | Director of Operations | 1 | 0.48 |
| Production Manager | 9 | 4.33 | |
| Production Technician I | 84 | 40.38 | |
| Production Technician II | 31 | 14.90 | |
| [SUBTOTAL] | 125 | 60.10 | |
| Sales | Area Sales Manager | 24 | 11.54 |
| Director of Sales | 1 | 0.48 | |
| Sales Manager | 2 | 0.96 | |
| [SUBTOTAL] | 27 | 12.98 | |
| Software Engineering | Software Engineer | 6 | 2.88 |
| Software Engineering Manager | 1 | 0.48 | |
| [SUBTOTAL] | 7 | 3.37 | |
| [TOTAL] | [SUBTOTAL] | 208 | 100.00 |
ВЫВОД
(только работающие (действующие) сотрудники)
Сначала подсчитаем количество подчинённых у руководителей всех уровней.
sql_quiery = \
"""
SELECT
"Manager Name",
(SELECT DISTINCT "department" AS "Sub-department" ),
(SELECT DISTINCT "position" AS "Subordinates" ),
COUNT("Employee Number")
FROM
hr_dataset
WHERE
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
GROUP BY
"Manager Name",
"Sub-department",
"Subordinates"
ORDER BY
"Manager Name",
"Sub-department",
"Subordinates"
;"""
df_subordination1 = pd.read_sql(sql_quiery, conn, index_col=["Manager Name", "Sub-department", "Subordinates"])
df_subordination1
| count | |||
|---|---|---|---|
| Manager Name | Sub-department | Subordinates | |
| Alex Sweetwater | Software Engineering | Software Engineer | 6 |
| Amy Dunn | Production | Production Technician I | 7 |
| Production Technician II | 1 | ||
| Board of Directors | Admin Offices | Sr. Accountant | 1 |
| Executive Office | President & CEO | 1 | |
| Brandon R. LeBlanc | Admin Offices | Accountant I | 3 |
| Administrative Assistant | 2 | ||
| Sr. Accountant | 1 | ||
| Brannon Miller | Production | Production Technician I | 13 |
| Production Technician II | 2 | ||
| Brian Champaigne | IT/IS | BI Developer | 4 |
| Data Architect | 1 | ||
| Senior BI Developer | 3 | ||
| David Stanley | Production | Production Technician I | 12 |
| Production Technician II | 3 | ||
| Debra Houlihan | Sales | Sales Manager | 2 |
| Elijiah Gray | Production | Production Technician I | 9 |
| Production Technician II | 5 | ||
| Eric Dougall | IT/IS | IT Support | 4 |
| Janet King | Admin Offices | Shared Services Manager | 1 |
| IT/IS | CIO | 1 | |
| Production | Director of Operations | 1 | |
| Production Manager | 9 | ||
| Sales | Director of Sales | 1 | |
| Jennifer Zamora | IT/IS | BI Director | 1 |
| IT Director | 1 | ||
| IT Manager - DB | 1 | ||
| IT Manager - Infra | 1 | ||
| IT Manager - Support | 1 | ||
| Software Engineering | Software Engineering Manager | 1 | |
| John Smith | Sales | Area Sales Manager | 11 |
| Kelley Spirea | Production | Production Technician I | 11 |
| Production Technician II | 5 | ||
| Ketsia Liebig | Production | Production Technician I | 12 |
| Production Technician II | 4 | ||
| Kissy Sullivan | Production | Production Technician I | 6 |
| Production Technician II | 4 | ||
| Lynn Daneault | Sales | Area Sales Manager | 13 |
| Michael Albert | Production | Production Technician I | 10 |
| Production Technician II | 3 | ||
| Peter Monroe | IT/IS | Network Engineer | 8 |
| Sr. Network Engineer | 5 | ||
| Simon Roup | IT/IS | Database Administrator | 8 |
| Sr. DBA | 1 | ||
| Webster Butler | Production | Production Technician I | 4 |
| Production Technician II | 4 |
Теперь выясним полную структуру организации по подразделениям, исходя из данных общего реестра о работающих сотруниках
Комментарий:
Менеджеры - это тоже работники. Однако в исходных данных много ошибок: недостающие или лишние пробелы, пропущенные первые буквы вторых имён, которые в одних полях учитываются, а в других - нет. Помимо этого, имена менеджеров и сотрудников представлены в разных последовательностях имени и фамилии. Попытаюсь привести имена менеджеров и сотрудников (тех же менеджеров) к одному формату, чтобы данные предыдущей таблицы дополнить должностями менеджеров и построить иерархическую структуру компании.
# Создадим служебный датафрейм, где нормируем имена сотрудников и менеджеров и сопоставим для каждого менеджера
# номер сотрудника, каковым этот менеджер является. Используем данные только для действующих работников
sql_quiery = \
"""
-- ---------------------
-- Создадим временные представления для нормализации имён сотрудников и менеджеров
-- и присвоения менеджерам соовтетсвующих им номеров как сотрудникам.
-- ---------------------
-- --------------------
-- Нормализуем имена сотрудников. CASE будет взаимоисключающим.
-- Приведённый код не универсален, он устраняет не все возможные ошибки, а только те, что есть в именах менеджеров
-- и эквивалентных им именах сотрудников.
-- -----------------
-- Уберём лишние пробелы в конце имён сотрудников
CREATE OR REPLACE TEMP VIEW
EmployeesNameTrim
AS
(SELECT
"Employee Name",
TRIM(TRAILING ' ' FROM "Employee Name") AS empl_name_norm,
"Employee Number",
"Employment Status",
department,
position,
"Manager Name"
FROM hr_dataset
WHERE -- Условие, что работники действующие
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
)
;
-- Если нет пробела после запятой после фамилии перед именем - добавляем пробел
CREATE OR REPLACE TEMP VIEW
EmployeesMissingSpaces
AS
(SELECT
"Employee Name",
(CASE
-- WHEN STRPOS(empl_name_norm, ' ') = 0
WHEN ' ' NOT IN (empl_name_norm)
THEN
CONCAT(
SPLIT_PART(empl_name_norm, ',', 1),', ',SPLIT_PART (empl_name_norm, ',', 2)
)
ELSE
empl_name_norm
END
) AS empl_name_norm,
"Employee Number",
"Employment Status",
department,
position,
"Manager Name"
FROM
EmployeesNameTrim
)
;
-- Если есть лишние пробелы внутри поля - заменяем их одним пробелом
CREATE OR REPLACE TEMP VIEW
EmployeesReducedSpaces
AS
(SELECT
"Employee Name",
(CASE
WHEN STRPOS(empl_name_norm, ' ') <> 0
THEN
REPLACE(empl_name_norm, ' ', ' ')
ELSE
empl_name_norm
END
) AS empl_name_norm,
"Employee Number",
"Employment Status",
department,
position,
"Manager Name"
FROM
EmployeesMissingSpaces
)
;
-- --------------------
-- Нормализуем имена менеджеров. Здесь нормализация более специфична. CASE будет взаимоисключающим.
-- Приведённый код не универсален, он устраняет не все возможные ошибки, а только те, что есть в именах менеджеров
-- и эквивалентных им именах сотрудников.
-- -----------------
CREATE OR REPLACE TEMP VIEW
ManagersNorm
AS
(SELECT
"Employee Name",
empl_name_norm,
"Employee Number",
"Employment Status",
department,
position,
"Manager Name",
(CASE
-- Специально для 'Board of Directors'. Его надо оставить, как есть.
WHEN "Manager Name" = 'Board of Directors'
THEN 'Board of Directors'
-- Специально для 'Butler, Webster L'. В списке менеджеров он без 'L'.
WHEN "Manager Name" = 'Webster Butler'
THEN CONCAT(
SPLIT_PART("Manager Name", ' ', 2),
', ',
SPLIT_PART ("Manager Name", ' ', 1),
' L'
)
-- Для случаев наличия первой буквы второго имени, то есть существует третье слово в Manager Name
-- (Формат: Имя ВтороеИмя. Фамилия), поменяем местами имя, второе имя и фамилию, отделим фамилию запятой.
WHEN SPLIT_PART("Manager Name", ' ', 3) <> ''
THEN CONCAT(
SPLIT_PART("Manager Name", ' ', 3),
', ',
SPLIT_PART ("Manager Name", ' ', 1),
' ',
REPLACE(SPLIT_PART("Manager Name", ' ', 2), '.', '') -- здесь надо убрать точку
)
-- Все остальные случаи: поменяем местами имя и фамилию и разделим их запятой с пробелом (стандартный вариант)
ELSE
CONCAT(
SPLIT_PART("Manager Name", ' ', 2),
', ',
SPLIT_PART ("Manager Name", ' ', 1)
)
END
) AS manager_name_norm
FROM
EmployeesReducedSpaces
)
;
-- --------------------
-- Для дальнейшего составления структуры компании, выведем в отдельный столбец номера менеджеров как сотрудников.
-- Далее будем использовать их как ID
-- --------------------
-- Опредеелим номера менеджеров как сотрудников, сопоставив менеджеров и сотрудников по нормализованным именам.
CREATE OR REPLACE TEMP VIEW
EmployeesAndManagers
AS
(SELECT
"Employee Name",
empl_name_norm,
"Employee Number",
"Employment Status",
department,
position,
"Manager Name",
manager_name_norm,
(SELECT
"Employee Number"
FROM
(SELECT
"Employee Number",
empl_name_norm
FROM
ManagersNorm
) AS employees
WHERE
empl_name_norm = manager_name_norm
) AS "Manager Number"
FROM ManagersNorm
)
;
SELECT
*
FROM
EmployeesAndManagers
;"""
df_empl_and_manag_norm = pd.read_sql(sql_quiery, conn)
df_empl_and_manag_norm.head()
| Employee Name | empl_name_norm | Employee Number | Employment Status | department | position | Manager Name | manager_name_norm | Manager Number | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Brown, Mia | Brown, Mia | 1103024456 | Active | Admin Offices | Accountant I | Brandon R. LeBlanc | LeBlanc, Brandon R | 1.102024e+09 |
| 1 | LaRotonda, William | LaRotonda, William | 1106026572 | Active | Admin Offices | Accountant I | Brandon R. LeBlanc | LeBlanc, Brandon R | 1.102024e+09 |
| 2 | Steans, Tyrone | Steans, Tyrone | 1302053333 | Active | Admin Offices | Accountant I | Brandon R. LeBlanc | LeBlanc, Brandon R | 1.102024e+09 |
| 3 | Howard, Estelle | Howard, Estelle | 1211050782 | Active | Admin Offices | Administrative Assistant | Brandon R. LeBlanc | LeBlanc, Brandon R | 1.102024e+09 |
| 4 | Singh, Nan | Singh, Nan | 1307059817 | Active | Admin Offices | Administrative Assistant | Brandon R. LeBlanc | LeBlanc, Brandon R | 1.102024e+09 |
Обратим внимание, что Manager Number в Pandas представлено в формате Float64 всесто Int. Это не имеет значения, так как номер менеджера для операций JOIN будет браться из данных базы через запросы SQL, а не из датафрейма Pandas.
Теперь последовательными операциями JOIN создадим на основе этой таблицы общую структуру организации с указанием подразделений должностей и имён сотрудников. На низшем уровне, там где сотрудники не являются менеджерами, подсчитаем количество сотрудников, подчинённых вышестоящему менеджеру.
Комментарий:
Если у менеджера определённого уровня подчиненных нет - то все столбцы следующих уровней соответствующей строки заполнены NaN, и конечное число подчинённых низшего уровня будет равно 0. Надо понимать, что такой менеджер является единственным подчинённым своего начальника.
Имена сотрудников низшего уровня теоретически можно было бы вывести, но я посчитал эти данные излишними и ограничился подсчётом их количества.
На 4м уровня данные о подразделениях в столбцах излишни. Членения подразделений ниже 3-го уровня не происходит
# Создадим таблицу иерархии подразделений и менеджеров
sql_quiery = \
"""
-- Создадим временное представление для этих целей.
-- Изначально для менеджера 'Board of Directors' - 1й уровень
CREATE OR REPLACE TEMP VIEW Level1
AS
SELECT
manager_name_norm AS "Top Manager",
department AS "Sub1 Department",
position AS "Sub1 Position",
empl_name_norm AS "Sub1 Name",
"Employee Number" AS "Sub1 Number"
FROM
EmployeesAndManagers
WHERE
"Manager Name" = 'Board of Directors'
;
-- Для второго уровня подчинёности
CREATE OR REPLACE TEMP VIEW Level2
AS
SELECT
"Top Manager",
"Sub1 Department",
"Sub1 Position",
"Sub1 Name" ,
"Sub1 Number",
department AS "Sub2 Department",
position AS "Sub2 Position",
empl_name_norm AS "Sub2 Name",
"Employee Number" AS "Sub2 Number"
FROM
Level1
LEFT JOIN
EmployeesAndManagers
ON "Sub1 Number" = "Manager Number"
;
-- Для третьего уровня подчинёности
CREATE OR REPLACE TEMP VIEW Level3
AS
SELECT
"Top Manager",
"Sub1 Department",
"Sub1 Position",
"Sub1 Name" ,
"Sub1 Number",
"Sub2 Department",
"Sub2 Position",
"Sub2 Name",
"Sub2 Number",
department AS "Sub3 Department",
position AS "Sub3 Position",
empl_name_norm AS "Sub3 Name",
"Employee Number" AS "Sub3 Number"
FROM
Level2
LEFT JOIN
EmployeesAndManagers
ON "Sub2 Number" = "Manager Number"
;
-- Для четвертого уровня подчинёности
CREATE OR REPLACE TEMP VIEW Level4
AS
SELECT
"Top Manager",
"Sub1 Department",
"Sub1 Position",
"Sub1 Name" ,
"Sub1 Number",
"Sub2 Department",
"Sub2 Position",
"Sub2 Name",
"Sub2 Number",
"Sub3 Department",
"Sub3 Position",
"Sub3 Name",
"Sub3 Number",
department AS "Sub4 Department",
position AS "Sub4 Position",
empl_name_norm AS "Sub4 Name",
"Employee Number" AS "Sub4 Number"
FROM
Level3
LEFT JOIN
EmployeesAndManagers
ON "Sub3 Number" = "Manager Number"
;
-- Пятого уровня подчинённости (как показывает аналогичный код) не существует
SELECT
"Top Manager",
"Sub1 Department",
"Sub1 Position",
"Sub1 Name" ,
"Sub2 Department",
"Sub2 Position",
"Sub2 Name",
"Sub3 Department",
"Sub3 Position",
"Sub3 Name",
-- "Sub4 Department",
"Sub4 Position",
-- "Sub4 Name",
COUNT("Sub4 Name") AS "sub5_count"
FROM Level4
GROUP BY
"Top Manager",
"Sub1 Department",
"Sub1 Position",
"Sub1 Name" ,
"Sub2 Department",
"Sub2 Position",
"Sub2 Name",
"Sub3 Department",
"Sub3 Position",
"Sub3 Name",
-- "Sub4 Department",
"Sub4 Position"
ORDER BY
"Sub1 Department",
"Sub1 Position",
"Sub2 Department",
"Sub2 Position",
"Sub3 Department",
"Sub3 Position",
-- "Sub4 Department",
"Sub4 Position"
;
"""
df_subordination2 = pd.read_sql(sql_quiery,
conn,
index_col = [
"Top Manager",
"Sub1 Department",
"Sub1 Position",
"Sub1 Name" ,
"Sub2 Department",
"Sub2 Position",
"Sub2 Name",
"Sub3 Department",
"Sub3 Position",
"Sub3 Name",
"Sub4 Position"]
)
df_subordination2
| sub5_count | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Top Manager | Sub1 Department | Sub1 Position | Sub1 Name | Sub2 Department | Sub2 Position | Sub2 Name | Sub3 Department | Sub3 Position | Sub3 Name | Sub4 Position | |
| Board of Directors | Admin Offices | Sr. Accountant | Foster-Baker, Amy | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 |
| Executive Office | President & CEO | King, Janet | Admin Offices | Shared Services Manager | LeBlanc, Brandon R | Admin Offices | Accountant I | Brown, Mia | NaN | 0 | |
| LaRotonda, William | NaN | 0 | |||||||||
| Steans, Tyrone | NaN | 0 | |||||||||
| Administrative Assistant | Howard, Estelle | NaN | 0 | ||||||||
| Singh, Nan | NaN | 0 | |||||||||
| Sr. Accountant | Boutwell, Bonalyn | NaN | 0 | ||||||||
| IT/IS | CIO | Zamora, Jennifer | IT/IS | BI Director | Champaigne, Brian | BI Developer | 4 | ||||
| Data Architect | 1 | ||||||||||
| Senior BI Developer | 3 | ||||||||||
| IT Director | Foss, Jason | NaN | 0 | ||||||||
| IT Manager - DB | Roup, Simon | Database Administrator | 8 | ||||||||
| Sr. DBA | 1 | ||||||||||
| IT Manager - Infra | Monroe, Peter | Network Engineer | 8 | ||||||||
| Sr. Network Engineer | 5 | ||||||||||
| IT Manager - Support | Dougall, Eric | IT Support | 4 | ||||||||
| Software Engineering | Software Engineering Manager | Sweetwater, Alex | Software Engineer | 6 | |||||||
| Production | Director of Operations | Bramante, Elisa | NaN | NaN | NaN | NaN | 0 | ||||
| Production Manager | Albert, Michael | Production | Production Technician I | Adinolfi, Wilson K | NaN | 0 | |||||
| Chace, Beatrice | NaN | 0 | |||||||||
| Crimmings, Jean | NaN | 0 | |||||||||
| Gentry, Mildred | NaN | 0 | |||||||||
| Handschiegl, Joanne | NaN | 0 | |||||||||
| Keatts, Kramer | NaN | 0 | |||||||||
| Medeiros, Jennifer | NaN | 0 | |||||||||
| Owad, Clinton | NaN | 0 | |||||||||
| Sullivan, Timothy | NaN | 0 | |||||||||
| Von Massenbach, Anna | NaN | 0 | |||||||||
| Butler, Webster L | Production | Production Technician I | Becker, Scott | NaN | 0 | ||||||
| Chang, Donovan E | NaN | 0 | |||||||||
| Rivera, Haley | NaN | 0 | |||||||||
| Sewkumar, Nori | NaN | 0 | |||||||||
| Dunn, Amy | Production | Production Technician I | Anderson, Linda | NaN | 0 | ||||||
| Bernstein, Sean | NaN | 0 | |||||||||
| Desimone, Carl | NaN | 0 | |||||||||
| Fernandes, Nilson | NaN | 0 | |||||||||
| Girifalco, Evelyn | NaN | 0 | |||||||||
| Harrison, Kara | NaN | 0 | |||||||||
| Shields, Seffi | NaN | 0 | |||||||||
| Gray, Elijiah | Production | Production Technician I | Alagbe, Trina | NaN | 0 | ||||||
| Beatrice, Courtney | NaN | 0 | |||||||||
| Chan, Lin | NaN | 0 | |||||||||
| Darson, Jene'ya | NaN | 0 | |||||||||
| Harrell, Ludwick | NaN | 0 | |||||||||
| Lydon, Allison | NaN | 0 | |||||||||
| Motlagh, Dawn | NaN | 0 | |||||||||
| Sander, Kamrin | NaN | 0 | |||||||||
| Sutwell, Barbara | NaN | 0 | |||||||||
| Liebig, Ketsia | Production | Production Technician I | Athwal, Sam | NaN | 0 | ||||||
| Biden, Lowan M | NaN | 0 | |||||||||
| Cierpiszewski, Caroline | NaN | 0 | |||||||||
| Dickinson, Geoff | NaN | 0 | |||||||||
| Ferreira, Violeta | NaN | 0 | |||||||||
| Gold, Shenice | NaN | 0 | |||||||||
| Heitzman, Anthony | NaN | 0 | |||||||||
| Knapp, Bradley J | NaN | 0 | |||||||||
| Mahoney, Lauren | NaN | 0 | |||||||||
| Newman, Richard | NaN | 0 | |||||||||
| Peterson, Kayla | NaN | 0 | |||||||||
| Smith, Sade | NaN | 0 | |||||||||
| Miller, Brannon | Production | Production Technician I | Bachiochi, Linda | NaN | 0 | ||||||
| Billis, Helen | NaN | 0 | |||||||||
| Clukey, Elijian | NaN | 0 | |||||||||
| DiNocco, Lily | NaN | 0 | |||||||||
| Fidelia, Libby | NaN | 0 | |||||||||
| Gonzalez, Cayo | NaN | 0 | |||||||||
| Ivey, Rose | NaN | 0 | |||||||||
| Kretschmer, John | NaN | 0 | |||||||||
| Mangal, Debbie | NaN | 0 | |||||||||
| Ngodup, Shari | NaN | 0 | |||||||||
| Robinson, Elias | NaN | 0 | |||||||||
| Sparks, Taylor | NaN | 0 | |||||||||
| Tippett, Jeanette | NaN | 0 | |||||||||
| Spirea, Kelley | Production | Production Technician I | Barone, Francesco A | NaN | 0 | ||||||
| Carey, Michael | NaN | 0 | |||||||||
| Cornett, Lisa | NaN | 0 | |||||||||
| Engdahl, Jean | NaN | 0 | |||||||||
| England, Rex | NaN | 0 | |||||||||
| Gaul, Barbara | NaN | 0 | |||||||||
| Jhaveri, Sneha | NaN | 0 | |||||||||
| Osturnka, Adeel | NaN | 0 | |||||||||
| Punjabhi, Louis | NaN | 0 | |||||||||
| Saar-Beckles, Melinda | NaN | 0 | |||||||||
| Stoica, Rick | NaN | 0 | |||||||||
| Stanley, David | Production | Production Technician I | Cockel, James | NaN | 0 | ||||||
| Dobrin, Denisa S | NaN | 0 | |||||||||
| Garcia, Raul | NaN | 0 | |||||||||
| Gordon, David | NaN | 0 | |||||||||
| Jackson, Maryellen | NaN | 0 | |||||||||
| Langton, Enrico | NaN | 0 | |||||||||
| Maurice, Shana | NaN | 0 | |||||||||
| Nguyen, Lei-Ming | NaN | 0 | |||||||||
| Pitt, Brad | NaN | 0 | |||||||||
| Rose, Ashley | NaN | 0 | |||||||||
| Trang, Mei | NaN | 0 | |||||||||
| Zima, Colleen | NaN | 0 | |||||||||
| Sullivan, Kissy | Production | Production Technician I | Bugali, Josephine | NaN | 0 | ||||||
| Garneau, Hamish | NaN | 0 | |||||||||
| Goyal, Roxana | NaN | 0 | |||||||||
| Jacobi, Hannah | NaN | 0 | |||||||||
| Mckenna, Sandy | NaN | 0 | |||||||||
| Stanford, Barbara M | NaN | 0 | |||||||||
| Albert, Michael | Production | Production Technician II | Blount, Dianna | NaN | 0 | ||||||
| Erilus, Angela | NaN | 0 | |||||||||
| Moumanil, Maliki | NaN | 0 | |||||||||
| Butler, Webster L | Production | Production Technician II | Buccheri, Joseph | NaN | 0 | ||||||
| Fancett, Nicole | NaN | 0 | |||||||||
| Hutter, Rosalie | NaN | 0 | |||||||||
| Manchester, Robyn | NaN | 0 | |||||||||
| Dunn, Amy | Production | Production Technician II | Burke, Joelle | NaN | 0 | ||||||
| Gray, Elijiah | Production | Production Technician II | Faller, Megan | NaN | 0 | ||||||
| Hunts, Julissa | NaN | 0 | |||||||||
| Lunquist, Lisa | NaN | 0 | |||||||||
| Nowlan, Kristie | NaN | 0 | |||||||||
| Smith, Joe | NaN | 0 | |||||||||
| Liebig, Ketsia | Production | Production Technician II | Burkett, Benjamin | NaN | 0 | ||||||
| Jeannite, Tayana | NaN | 0 | |||||||||
| McCarthy, Brigit | NaN | 0 | |||||||||
| Walker, Roger | NaN | 0 | |||||||||
| Miller, Brannon | Production | Production Technician II | Johnston, Yen | NaN | 0 | ||||||
| Petingill, Shana | NaN | 0 | |||||||||
| Spirea, Kelley | Production | Production Technician II | Beak, Kimberly | NaN | 0 | ||||||
| Hankard, Earnest | NaN | 0 | |||||||||
| Linden, Mathew | NaN | 0 | |||||||||
| Moran, Patrick | NaN | 0 | |||||||||
| Sahoo, Adil | NaN | 0 | |||||||||
| Stanley, David | Production | Production Technician II | Good, Susan | NaN | 0 | ||||||
| Monkfish, Erasumus | NaN | 0 | |||||||||
| Wolk, Hang T | NaN | 0 | |||||||||
| Sullivan, Kissy | Production | Production Technician II | Davis, Daniel | NaN | 0 | ||||||
| Gosciminski, Phylicia | NaN | 0 | |||||||||
| Monterro, Luisa | NaN | 0 | |||||||||
| Woodson, Jason | NaN | 0 | |||||||||
| Sales | Director of Sales | Houlihan, Debra | Sales | Sales Manager | Daneault, Lynn | Area Sales Manager | 13 | ||||
| Smith, John | Area Sales Manager | 11 |
ВЫВОД
В целом, структура компании является строго иерархичной. Здесь есть несколько особенностей (которые, однако, не являются чем-то необычным, встречаются достаточно часто, но тем не менее могут свидетельствовать о некоторым несовершенстве системы управления):
(коэффициент пересчета: годовая заработная плата = 2080 х почасовую ставку)
# Создадим DF для исследования зарплатного распределения сотрудников компаниии
sql_quiery = \
"""
SELECT
"Employee Number",
CAST
("Pay Rate" * 2080 AS INTEGER)
AS usd_per_year
FROM
hr_dataset
WHERE -- Условие, что работники действующие
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
ORDER BY
"Pay Rate"
;
"""
df_year_payrate = pd.read_sql(sql_quiery, conn)
df_year_payrate
| Employee Number | usd_per_year | |
|---|---|---|
| 0 | 1304055683 | 29120 |
| 1 | 1111030244 | 29120 |
| 2 | 1407069061 | 29120 |
| 3 | 1403066020 | 31200 |
| 4 | 1205033102 | 31200 |
| 5 | 1211051232 | 31200 |
| 6 | 1001109612 | 31200 |
| 7 | 1102024121 | 31200 |
| 8 | 1106026579 | 31200 |
| 9 | 1599991009 | 31200 |
| 10 | 807010161 | 31616 |
| 11 | 1403066069 | 32760 |
| 12 | 1408069882 | 33280 |
| 13 | 1404066949 | 33280 |
| 14 | 1410071137 | 33280 |
| 15 | 1401064562 | 33280 |
| 16 | 1208048062 | 33280 |
| 17 | 1308060366 | 33280 |
| 18 | 1109029366 | 33280 |
| 19 | 1411071212 | 33280 |
| 20 | 1204032927 | 33280 |
| 21 | 1307059817 | 34445 |
| 22 | 1202031618 | 34840 |
| 23 | 1101023679 | 34861 |
| 24 | 1209049259 | 35360 |
| 25 | 1001735072 | 35360 |
| 26 | 710007555 | 35360 |
| 27 | 1408069539 | 35360 |
| 28 | 1304055987 | 35360 |
| 29 | 1110029777 | 35360 |
| 30 | 1011022887 | 35360 |
| 31 | 1307059944 | 35360 |
| 32 | 1104025414 | 37440 |
| 33 | 1302053339 | 37440 |
| 34 | 1309061015 | 39520 |
| 35 | 1501072192 | 39520 |
| 36 | 1203032357 | 39520 |
| 37 | 1301052462 | 39520 |
| 38 | 1412071713 | 39520 |
| 39 | 1002017900 | 39520 |
| 40 | 1201031310 | 39520 |
| 41 | 1305057282 | 40560 |
| 42 | 1311063172 | 41080 |
| 43 | 1404066622 | 41600 |
| 44 | 1106026474 | 41600 |
| 45 | 1201031438 | 41600 |
| 46 | 1409070522 | 41600 |
| 47 | 1101023353 | 41600 |
| 48 | 1410070998 | 41600 |
| 49 | 1401064327 | 41600 |
| 50 | 1408069635 | 41600 |
| 51 | 1311063114 | 41600 |
| 52 | 1109029256 | 41600 |
| 53 | 1501072124 | 41600 |
| 54 | 1101023612 | 43680 |
| 55 | 1311062610 | 43680 |
| 56 | 706006285 | 43680 |
| 57 | 1011022883 | 43680 |
| 58 | 1302053044 | 43680 |
| 59 | 1406068241 | 43680 |
| 60 | 1503072857 | 43680 |
| 61 | 1302053362 | 43680 |
| 62 | 1007020403 | 44200 |
| 63 | 1211050782 | 44720 |
| 64 | 1405067565 | 45760 |
| 65 | 1205033180 | 45760 |
| 66 | 1103024335 | 45760 |
| 67 | 1101023457 | 45760 |
| 68 | 1012023152 | 45760 |
| 69 | 1301052124 | 45760 |
| 70 | 1403066194 | 45760 |
| 71 | 1209048696 | 45760 |
| 72 | 1405067642 | 45760 |
| 73 | 1412071844 | 45760 |
| 74 | 1001970770 | 45760 |
| 75 | 1011022818 | 45760 |
| 76 | 1106026896 | 45760 |
| 77 | 1111030129 | 45760 |
| 78 | 1012023295 | 45760 |
| 79 | 1212051409 | 45760 |
| 80 | 1008020942 | 46800 |
| 81 | 1308060754 | 47840 |
| 82 | 1106026572 | 47840 |
| 83 | 1110029623 | 47840 |
| 84 | 1304055947 | 47840 |
| 85 | 803009012 | 47840 |
| 86 | 1409070255 | 49920 |
| 87 | 1105026041 | 49920 |
| 88 | 1402065085 | 49920 |
| 89 | 1405067064 | 49920 |
| 90 | 1307059937 | 49920 |
| 91 | 909015167 | 49920 |
| 92 | 1105025661 | 49920 |
| 93 | 1006020020 | 49920 |
| 94 | 1312063675 | 49920 |
| 95 | 1012023010 | 50440 |
| 96 | 1001549006 | 50440 |
| 97 | 1501071909 | 50960 |
| 98 | 1407069280 | 51480 |
| 99 | 1306057810 | 52000 |
| 100 | 1001103149 | 52000 |
| 101 | 1011022820 | 52000 |
| 102 | 1201031274 | 52000 |
| 103 | 1106026433 | 52000 |
| 104 | 1408069503 | 54080 |
| 105 | 1103024843 | 54080 |
| 106 | 1406067957 | 54080 |
| 107 | 1301052449 | 54080 |
| 108 | 602000312 | 54080 |
| 109 | 1001504432 | 54288 |
| 110 | 1104025435 | 54891 |
| 111 | 1001644719 | 56160 |
| 112 | 1303054329 | 56160 |
| 113 | 1403066125 | 56160 |
| 114 | 1001956578 | 56160 |
| 115 | 1108028428 | 56160 |
| 116 | 1108028351 | 56160 |
| 117 | 1404066711 | 56160 |
| 118 | 1203032263 | 57179 |
| 119 | 1103024924 | 58240 |
| 120 | 1103024456 | 59280 |
| 121 | 1301052902 | 60299 |
| 122 | 1301052436 | 60320 |
| 123 | 1302053333 | 60320 |
| 124 | 1106026462 | 60320 |
| 125 | 808010278 | 62816 |
| 126 | 1501072093 | 65312 |
| 127 | 1110029732 | 65312 |
| 128 | 1105025718 | 70720 |
| 129 | 1307060188 | 72696 |
| 130 | 1201031308 | 72696 |
| 131 | 1406068403 | 73840 |
| 132 | 1101023540 | 76960 |
| 133 | 1988299991 | 81120 |
| 134 | 1407068885 | 82264 |
| 135 | 1003018246 | 83200 |
| 136 | 1102024173 | 87360 |
| 137 | 1203032255 | 87776 |
| 138 | 1108027853 | 88920 |
| 139 | 1012023013 | 89440 |
| 140 | 1009919940 | 93600 |
| 141 | 1009920000 | 93600 |
| 142 | 1009919990 | 93600 |
| 143 | 1212052023 | 93600 |
| 144 | 1009919980 | 95680 |
| 145 | 906014183 | 97760 |
| 146 | 1107027358 | 99008 |
| 147 | 1201031324 | 99840 |
| 148 | 1411071506 | 102128 |
| 149 | 1012023185 | 102440 |
| 150 | 1009919930 | 104520 |
| 151 | 1409070147 | 106080 |
| 152 | 1009919970 | 106080 |
| 153 | 1102024149 | 108160 |
| 154 | 1009919960 | 108680 |
| 155 | 1000974650 | 110240 |
| 156 | 1107027351 | 110240 |
| 157 | 1308060959 | 110240 |
| 158 | 904013591 | 111904 |
| 159 | 1402065303 | 112320 |
| 160 | 1411071295 | 112320 |
| 161 | 1307060077 | 112320 |
| 162 | 1411071312 | 112528 |
| 163 | 1501072311 | 113360 |
| 164 | 1102024115 | 114400 |
| 165 | 1103024679 | 114400 |
| 166 | 1110029990 | 114400 |
| 167 | 1405067298 | 114400 |
| 168 | 1504073313 | 114400 |
| 169 | 1403065721 | 114400 |
| 170 | 1409070567 | 114400 |
| 171 | 1408069481 | 114400 |
| 172 | 1411071302 | 114400 |
| 173 | 1203032099 | 114400 |
| 174 | 1104025008 | 114400 |
| 175 | 1412071660 | 114400 |
| 176 | 1209049326 | 114400 |
| 177 | 1306057978 | 114400 |
| 178 | 1111030684 | 114400 |
| 179 | 812011761 | 114400 |
| 180 | 1102024106 | 114400 |
| 181 | 1502072711 | 114400 |
| 182 | 1401064637 | 114400 |
| 183 | 1312063714 | 114400 |
| 184 | 1009919950 | 114400 |
| 185 | 1301052347 | 114816 |
| 186 | 1204032843 | 115440 |
| 187 | 1411071481 | 115440 |
| 188 | 1303054625 | 115461 |
| 189 | 1101023577 | 116480 |
| 190 | 1209048771 | 116480 |
| 191 | 1111030503 | 116480 |
| 192 | 1499902910 | 116480 |
| 193 | 1306059197 | 116480 |
| 194 | 1001084890 | 116480 |
| 195 | 1108028108 | 116896 |
| 196 | 1501072180 | 118560 |
| 197 | 1203032498 | 118810 |
| 198 | 1009021646 | 124800 |
| 199 | 1006020066 | 124800 |
| 200 | 1010022337 | 127504 |
| 201 | 1106026933 | 128960 |
| 202 | 1011022863 | 131040 |
| 203 | 1009919920 | 132080 |
| 204 | 1101023754 | 133120 |
| 205 | 1192991000 | 135200 |
| 206 | 1112030816 | 135200 |
| 207 | 1001495124 | 166400 |
# Создадим график распределения сотрудников по зарплате
g=sns.displot(data=df_year_payrate,
x="usd_per_year",
binwidth=2500,
binrange=(27500, 166400),
height=10,
aspect=2,
color="teal",
alpha=0.7
)
# Определяем шкалу X:
scale_step = 10000 # Определяем шаг шкалы X - его можно менять!
# Определяем основной диапазон шкалы
scale_span = list(range((int(df_year_payrate['usd_per_year'].min() / scale_step) * scale_step),
int(df_year_payrate['usd_per_year'].max()),
scale_step))
# При неоходимости дополняем первый и послдений элементы к шкале (если они не вошли в шкалу ранее)
if df_year_payrate['usd_per_year'].min() < scale_span[0]:
scale_span = ([df_year_payrate['usd_per_year'].min()] + scale_span)
if df_year_payrate['usd_per_year'].max() > scale_span[-1]:
scale_span = scale_span + [df_year_payrate['usd_per_year'].max()]
plt.xticks(ticks=scale_span, fontsize=14, rotation=90) # установим шкалу X и размер её обозначений
plt.yticks(fontsize=14) # Установим размер обозначения для шкалы Y
g.set_xlabels(fontsize=16) # Размер подписей шкалы X
g.set_ylabels(fontsize=16) # Размер подписей шкалы Y
# Заголовок графика:
g.fig.suptitle("Общее распеределение сотрудников по размеру зарплаты в годовом выражении", fontsize=20, y=1.0125)
# Добавим вертикальные линии медианы и средней арифметической заработной платы
# Для этого определим функцию построения вертикальной линии и её подписи
def usd_lines(x, **kwargs):
# Построение линии медианы на основе принимаемых значений x, цвет, толщина, стиль
plt.axvline(x.median(), color='sienna', linewidth=3, linestyle=':')
# Построение линии средней на основе принимаемых значений x, цвет, толщина, стиль
plt.axvline(x.mean(), color='navy', linewidth=3, linestyle='-.')
# Создание подписи к линии медианы
plt.annotate(
text=f"медиана {x.median()}", # Аннотация линии медианы.
xy=(x.median()-1000, 20), # Положение подписи в единицах шкал графика
horizontalalignment='center', # Выравнивание текста по горизонтали
verticalalignment='top', # Выравнивание текста по вертикали
rotation="vertical", # Поворот подписи
color='sienna', # Цвет надписи
alpha=1,
fontsize=16
)
plt.annotate(
text=f"средняя {x.mean():,.1f}", # Аннотация линии медианы.
xy=(x.mean()-1000, 20), # Положение подписи в единицах шкал графика
horizontalalignment='center', # Выравнивание текста по горизонтали
verticalalignment='top', # Выравнивание текста по вертикали
rotation="vertical", # Поворот подписи
color='navy', # Цвет надписи
alpha=1,
fontsize=16
)
# Определяем "разметку" для исполнения функции построения линий
# Передаём функции величины возрастов
g.map(usd_lines, x=df_year_payrate['usd_per_year'])
plt.show()
ВЫВОД
# Выберем оцегки производительности из таблицы hr_dataset
sql_quiery = \
"""
(SELECT
COALESCE("Performance Score", '[TOTAL]') AS performace_score,
COUNT("Employee Number") AS number_of_employees
FROM
hr_dataset
WHERE
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
GROUP BY ROLLUP
("Performance Score")
ORDER BY
"Performance Score"
)
;
"""
df_performance = pd.read_sql(sql_quiery, conn)
df_performance
| performace_score | number_of_employees | |
|---|---|---|
| 0 | 90-day meets | 18 |
| 1 | Exceeds | 20 |
| 2 | Exceptional | 9 |
| 3 | Fully Meets | 125 |
| 4 | N/A- too early to review | 24 |
| 5 | Needs Improvement | 7 |
| 6 | PIP | 5 |
| 7 | [TOTAL] | 208 |
# Построим круговую диаграммы
# Определим данные для построения
dfg_performance = df_performance[:-1]
# Определим поле для диаграмм
fig, ax = plt.subplots(figsize=(16, 7), ncols=1, nrows=1)
fig.suptitle("Распеределение действующих сотрудников по оценке производительности", fontsize = 16, y=0.825)
#cmap = plt.colormaps["Paired"] # Выберем цветовую палитру (категориальную)
#my_colors = cmap(np.arange(6)*1) # Определим цвета для графика np.arrange(число сегментов)*коэффициент
cmap = plt.colormaps["PiYG"] # Выберем цветовую палитру (последовательную)
my_colors =cmap(np.linspace(0.1, 0.9, dfg_performance.shape[0]))
ax.pie(dfg_performance["number_of_employees"],
labels = dfg_performance["performace_score"],
autopct='%1.1f%%',
textprops={'fontsize':12},
pctdistance=0.8,
radius=0.8,
colors=my_colors
)
plt.show()
ВЫВОД
(в годах, по формуле 1 год = 360 дней)
# Создадим DF для исследования распределения действующих сотрудников компаниии по сроку работы в компании
sql_quiery = \
"""
SELECT
"Employee Number",
ROUND
(CAST("Days Employed" AS numeric) / 360, 1)
AS years_employed
FROM
hr_dataset
WHERE -- Условие, что работники действующие
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
ORDER BY
"Days Employed"
;"""
df_years_employed = pd.read_sql(sql_quiery, conn)
df_years_employed
| Employee Number | years_employed | |
|---|---|---|
| 0 | 1211050782 | 0.2 |
| 1 | 1009919990 | 0.6 |
| 2 | 1009920000 | 0.6 |
| 3 | 1009919980 | 0.8 |
| 4 | 1009919970 | 0.8 |
| 5 | 1009919960 | 0.8 |
| 6 | 1009919950 | 0.9 |
| 7 | 1009919930 | 1.2 |
| 8 | 1009919940 | 1.2 |
| 9 | 1009919920 | 1.2 |
| 10 | 1303054329 | 1.4 |
| 11 | 1209049326 | 1.4 |
| 12 | 1407069280 | 1.4 |
| 13 | 1311063172 | 1.4 |
| 14 | 1410070998 | 1.4 |
| 15 | 904013591 | 1.4 |
| 16 | 1010022337 | 1.4 |
| 17 | 1106026433 | 1.5 |
| 18 | 1110029623 | 1.6 |
| 19 | 1110029990 | 1.9 |
| 20 | 1102024115 | 1.9 |
| 21 | 1106026474 | 2.4 |
| 22 | 1103024924 | 2.5 |
| 23 | 1307060077 | 2.5 |
| 24 | 1302053339 | 2.6 |
| 25 | 1307059817 | 2.6 |
| 26 | 1406067957 | 2.7 |
| 27 | 1203032255 | 2.7 |
| 28 | 1102024173 | 2.7 |
| 29 | 1105025718 | 2.7 |
| 30 | 1309061015 | 2.7 |
| 31 | 1109029256 | 2.7 |
| 32 | 1301052347 | 2.7 |
| 33 | 1501072192 | 2.7 |
| 34 | 1012023013 | 2.7 |
| 35 | 1110029732 | 2.7 |
| 36 | 1411071506 | 2.7 |
| 37 | 1307060188 | 2.8 |
| 38 | 1203032099 | 2.8 |
| 39 | 1001956578 | 2.8 |
| 40 | 1101023353 | 2.8 |
| 41 | 1407068885 | 2.8 |
| 42 | 812011761 | 2.9 |
| 43 | 1403066069 | 2.9 |
| 44 | 1201031310 | 2.9 |
| 45 | 1003018246 | 2.9 |
| 46 | 808010278 | 2.9 |
| 47 | 1212052023 | 2.9 |
| 48 | 1101023540 | 2.9 |
| 49 | 1988299991 | 2.9 |
| 50 | 1007020403 | 3.1 |
| 51 | 1104025435 | 3.1 |
| 52 | 1308060959 | 3.1 |
| 53 | 1108028108 | 3.1 |
| 54 | 1411071312 | 3.1 |
| 55 | 1107027358 | 3.1 |
| 56 | 1108027853 | 3.1 |
| 57 | 1406068403 | 3.1 |
| 58 | 906014183 | 3.2 |
| 59 | 1205033180 | 3.2 |
| 60 | 1401064637 | 3.2 |
| 61 | 1306057978 | 3.2 |
| 62 | 1302053333 | 3.2 |
| 63 | 1208048062 | 3.2 |
| 64 | 1504073313 | 3.2 |
| 65 | 1302053362 | 3.2 |
| 66 | 1001970770 | 3.2 |
| 67 | 1204032927 | 3.2 |
| 68 | 1101023457 | 3.2 |
| 69 | 1211051232 | 3.2 |
| 70 | 1409070147 | 3.2 |
| 71 | 1403065721 | 3.3 |
| 72 | 1306059197 | 3.3 |
| 73 | 1011022818 | 3.3 |
| 74 | 1201031324 | 3.4 |
| 75 | 1501072124 | 3.4 |
| 76 | 1308060366 | 3.4 |
| 77 | 1012023010 | 3.4 |
| 78 | 1105025661 | 3.4 |
| 79 | 1306057810 | 3.4 |
| 80 | 1108028428 | 3.4 |
| 81 | 1499902910 | 3.6 |
| 82 | 1404066622 | 3.6 |
| 83 | 1302053044 | 3.6 |
| 84 | 706006285 | 3.6 |
| 85 | 1411071481 | 3.6 |
| 86 | 1001084890 | 3.6 |
| 87 | 1305057282 | 3.6 |
| 88 | 1001549006 | 3.6 |
| 89 | 1402065303 | 3.6 |
| 90 | 1009021646 | 3.6 |
| 91 | 1101023612 | 3.7 |
| 92 | 1311063114 | 3.7 |
| 93 | 1106026896 | 3.8 |
| 94 | 1103024335 | 3.8 |
| 95 | 1402065085 | 3.8 |
| 96 | 1304055683 | 3.8 |
| 97 | 1012023152 | 3.8 |
| 98 | 1401064562 | 3.9 |
| 99 | 1209049259 | 3.9 |
| 100 | 1106026572 | 3.9 |
| 101 | 710007555 | 3.9 |
| 102 | 1412071713 | 3.9 |
| 103 | 1101023754 | 3.9 |
| 104 | 1303054625 | 4.1 |
| 105 | 1308060754 | 4.1 |
| 106 | 1501071909 | 4.1 |
| 107 | 1101023577 | 4.1 |
| 108 | 1307059937 | 4.1 |
| 109 | 1408069635 | 4.1 |
| 110 | 1408069539 | 4.1 |
| 111 | 1501072180 | 4.2 |
| 112 | 1103024679 | 4.2 |
| 113 | 1301052124 | 4.2 |
| 114 | 1503072857 | 4.2 |
| 115 | 1301052462 | 4.2 |
| 116 | 1404066949 | 4.2 |
| 117 | 807010161 | 4.2 |
| 118 | 1108028351 | 4.2 |
| 119 | 1104025008 | 4.3 |
| 120 | 1408069882 | 4.3 |
| 121 | 1001504432 | 4.3 |
| 122 | 1312063675 | 4.3 |
| 123 | 1205033102 | 4.3 |
| 124 | 1412071844 | 4.3 |
| 125 | 1405067565 | 4.5 |
| 126 | 1203032357 | 4.5 |
| 127 | 1011022887 | 4.5 |
| 128 | 1111030684 | 4.5 |
| 129 | 1110029777 | 4.5 |
| 130 | 1111030129 | 4.5 |
| 131 | 1599991009 | 4.5 |
| 132 | 1104025414 | 4.5 |
| 133 | 1301052436 | 4.6 |
| 134 | 1001103149 | 4.6 |
| 135 | 1102024121 | 4.7 |
| 136 | 1403066020 | 4.7 |
| 137 | 1409070255 | 4.8 |
| 138 | 1106026933 | 4.9 |
| 139 | 909015167 | 5.0 |
| 140 | 1209048696 | 5.0 |
| 141 | 1012023185 | 5.1 |
| 142 | 1201031438 | 5.1 |
| 143 | 1102024149 | 5.2 |
| 144 | 1301052902 | 5.3 |
| 145 | 1107027351 | 5.4 |
| 146 | 1002017900 | 5.4 |
| 147 | 1304055987 | 5.5 |
| 148 | 1001495124 | 5.5 |
| 149 | 1106026579 | 5.5 |
| 150 | 1001109612 | 5.5 |
| 151 | 1406068241 | 5.6 |
| 152 | 1407069061 | 5.6 |
| 153 | 1209048771 | 5.6 |
| 154 | 1412071660 | 5.7 |
| 155 | 1109029366 | 5.7 |
| 156 | 1202031618 | 5.7 |
| 157 | 1103024843 | 5.7 |
| 158 | 1011022820 | 5.8 |
| 159 | 1111030503 | 5.8 |
| 160 | 1101023679 | 5.9 |
| 161 | 803009012 | 5.9 |
| 162 | 1408069481 | 5.9 |
| 163 | 1410071137 | 5.9 |
| 164 | 1011022863 | 5.9 |
| 165 | 1304055947 | 6.0 |
| 166 | 1203032498 | 6.0 |
| 167 | 1102024106 | 6.0 |
| 168 | 1006020020 | 6.0 |
| 169 | 1408069503 | 6.0 |
| 170 | 1405067064 | 6.1 |
| 171 | 1405067642 | 6.1 |
| 172 | 1201031274 | 6.1 |
| 173 | 1404066711 | 6.1 |
| 174 | 1012023295 | 6.2 |
| 175 | 1411071302 | 6.3 |
| 176 | 1001644719 | 6.4 |
| 177 | 1501072311 | 6.4 |
| 178 | 1411071212 | 6.5 |
| 179 | 1409070522 | 6.5 |
| 180 | 1312063714 | 6.5 |
| 181 | 1008020942 | 6.5 |
| 182 | 1203032263 | 6.6 |
| 183 | 1401064327 | 6.6 |
| 184 | 1105026041 | 6.6 |
| 185 | 1192991000 | 6.7 |
| 186 | 1403066125 | 6.7 |
| 187 | 1301052449 | 6.7 |
| 188 | 1403066194 | 6.7 |
| 189 | 1204032843 | 6.8 |
| 190 | 602000312 | 6.9 |
| 191 | 1111030244 | 7.0 |
| 192 | 1409070567 | 7.0 |
| 193 | 1311062610 | 7.0 |
| 194 | 1411071295 | 7.3 |
| 195 | 1106026462 | 7.3 |
| 196 | 1000974650 | 7.5 |
| 197 | 1501072093 | 7.7 |
| 198 | 1307059944 | 7.7 |
| 199 | 1112030816 | 7.7 |
| 200 | 1212051409 | 8.5 |
| 201 | 1405067298 | 9.0 |
| 202 | 1201031308 | 9.0 |
| 203 | 1006020066 | 9.0 |
| 204 | 1103024456 | 9.2 |
| 205 | 1011022883 | 10.0 |
| 206 | 1001735072 | 10.2 |
| 207 | 1502072711 | 12.1 |
# Создадим график распределения действующих сотрудников по сроку работы в компании в годах
g=sns.displot(data=df_years_employed,
x="years_employed",
binwidth=0.25,
binrange=(0, 12.1),
height=10,
aspect=2,
color="indianred",
alpha=0.7
)
# Определяем шкалу X:
scale_step = 1 # Определяем шаг шкалы X - его можно менять!
# Определяем основной диапазон шкалы
scale_span = list(range((int(df_years_employed['years_employed'].min() / scale_step) * scale_step),
int(df_years_employed['years_employed'].max()),
scale_step))
# При неоходимости дополняем первый и послдений элементы к шкале (если они не вошли в шкалу ранее)
if df_years_employed['years_employed'].min() < scale_span[0]:
scale_span = ([df_years_employed['years_employed'].min()] + scale_span)
if df_years_employed['years_employed'].max() > scale_span[-1]:
scale_span = scale_span + [df_years_employed['years_employed'].max()]
plt.xticks(ticks=scale_span, fontsize=14, rotation=0) # установим шкалу X и размер её обозначений
plt.yticks(fontsize=14) # Установим размер обозначения для шкалы Y
g.set_xlabels(fontsize=16) # Размер подписей шкалы X
g.set_ylabels(fontsize=16) # Размер подписей шкалы Y
# Заголовок графика:
g.fig.suptitle("Общее распеределение действующих сотрудников по сроку работы в компании", fontsize=20, y=1.0125)
# Добавим вертикальные линии медианы и средней арифметической срока работы в компании
# Для этого определим функцию построения вертикальлной линии и её подписи
def years_employed_lines(x, **kwargs):
# Построение линии медианы на основе принимаемых значений x, цвет, толщина, стиль
plt.axvline(x.median(), color='forestgreen', linewidth=3, linestyle=':')
# Построение линии средней на основе принимаемых значений x, цвет, толщина, стиль
plt.axvline(x.mean(), color='navy', linewidth=3, linestyle='-.')
# Создание подписи к линии медианы
plt.annotate(
text=f"медиана {x.median()}", # Аннотация линии медианы.
xy=(x.median()-0.125, 20), # Положение подписи в единицах шкал графика
horizontalalignment='center', # Выравнивание текста по горизонтали
verticalalignment='top', # Выравнивание текста по вертикали
rotation="vertical", # Поворот подписи
color='forestgreen', # Цвет надписи
alpha=1,
fontsize=16
)
plt.annotate(
text=f"средняя {x.mean():,.1f}", # Аннотация линии медианы.
xy=(x.mean()-0.125, 20), # Положение подписи в единицах шкал графика
horizontalalignment='center', # Выравнивание текста по горизонтали
verticalalignment='top', # Выравнивание текста по вертикали
rotation="vertical", # Поворот подписи
color='navy', # Цвет надписи
alpha=1,
fontsize=16
)
# Определяем "разметку" для исполнения функции построения линий
# Передаём функции величины возрастов
g.map(years_employed_lines, x=df_years_employed['years_employed'])
plt.show()
ВЫВОД
# Создадим DF для исследования распределения сотрудников компаниии по дате найма в компанию
# На временной шкале надо учесть и месяцы, в которых никто не был принят на работу.
sql_quiery = \
"""
-- создадим подзапросы для выборки из hr_dataset, а также для общей шкалы времени от минимальной даты приема
-- на работу до максимальной (generate_sdries()). Объединим их на основе месяца с сохранением всех данных полной временной
-- шкалы
WITH
employees_selection AS
(SELECT
"Employee Number",
DATE_TRUNC('month', "Date of Hire") AS month -- приведем даты найма к месяцу (начало месяца)
FROM
hr_dataset
WHERE -- Условие, что работники действующие
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
),
month_of_hire AS
(SELECT
COUNT("Employee Number") AS employee_count,
month
FROM
employees_selection
GROUP BY
month
ORDER BY
month
),
months AS
(SELECT
month
FROM generate_series(
(SELECT DATE_TRUNC('month', MIN("Date of Hire")) FROM hr_dataset), -- выберем минимальную дату приема на работу
(SELECT DATE_TRUNC('month', MAX("Date of Hire")) FROM hr_dataset), -- выберем максимальную дату приема на работу
'1 month') AS month -- выберем месячный интервал
)
SELECT * FROM
months
LEFT OUTER JOIN
month_of_hire
USING(month)
;
"""
df_date_hire = pd.read_sql(sql_quiery, conn)
df_date_hire
| month | employee_count | |
|---|---|---|
| 0 | 2006-01-01 00:00:00+00:00 | 1.0 |
| 1 | 2006-02-01 00:00:00+00:00 | NaN |
| 2 | 2006-03-01 00:00:00+00:00 | NaN |
| 3 | 2006-04-01 00:00:00+00:00 | NaN |
| 4 | 2006-05-01 00:00:00+00:00 | NaN |
| 5 | 2006-06-01 00:00:00+00:00 | NaN |
| 6 | 2006-07-01 00:00:00+00:00 | NaN |
| 7 | 2006-08-01 00:00:00+00:00 | NaN |
| 8 | 2006-09-01 00:00:00+00:00 | NaN |
| 9 | 2006-10-01 00:00:00+00:00 | NaN |
| 10 | 2006-11-01 00:00:00+00:00 | NaN |
| 11 | 2006-12-01 00:00:00+00:00 | NaN |
| 12 | 2007-01-01 00:00:00+00:00 | NaN |
| 13 | 2007-02-01 00:00:00+00:00 | NaN |
| 14 | 2007-03-01 00:00:00+00:00 | NaN |
| 15 | 2007-04-01 00:00:00+00:00 | NaN |
| 16 | 2007-05-01 00:00:00+00:00 | NaN |
| 17 | 2007-06-01 00:00:00+00:00 | NaN |
| 18 | 2007-07-01 00:00:00+00:00 | NaN |
| 19 | 2007-08-01 00:00:00+00:00 | NaN |
| 20 | 2007-09-01 00:00:00+00:00 | NaN |
| 21 | 2007-10-01 00:00:00+00:00 | NaN |
| 22 | 2007-11-01 00:00:00+00:00 | 1.0 |
| 23 | 2007-12-01 00:00:00+00:00 | NaN |
| 24 | 2008-01-01 00:00:00+00:00 | 1.0 |
| 25 | 2008-02-01 00:00:00+00:00 | NaN |
| 26 | 2008-03-01 00:00:00+00:00 | NaN |
| 27 | 2008-04-01 00:00:00+00:00 | NaN |
| 28 | 2008-05-01 00:00:00+00:00 | NaN |
| 29 | 2008-06-01 00:00:00+00:00 | NaN |
| 30 | 2008-07-01 00:00:00+00:00 | NaN |
| 31 | 2008-08-01 00:00:00+00:00 | NaN |
| 32 | 2008-09-01 00:00:00+00:00 | NaN |
| 33 | 2008-10-01 00:00:00+00:00 | 1.0 |
| 34 | 2008-11-01 00:00:00+00:00 | NaN |
| 35 | 2008-12-01 00:00:00+00:00 | NaN |
| 36 | 2009-01-01 00:00:00+00:00 | 3.0 |
| 37 | 2009-02-01 00:00:00+00:00 | NaN |
| 38 | 2009-03-01 00:00:00+00:00 | NaN |
| 39 | 2009-04-01 00:00:00+00:00 | NaN |
| 40 | 2009-05-01 00:00:00+00:00 | NaN |
| 41 | 2009-06-01 00:00:00+00:00 | NaN |
| 42 | 2009-07-01 00:00:00+00:00 | 1.0 |
| 43 | 2009-08-01 00:00:00+00:00 | NaN |
| 44 | 2009-09-01 00:00:00+00:00 | NaN |
| 45 | 2009-10-01 00:00:00+00:00 | NaN |
| 46 | 2009-11-01 00:00:00+00:00 | NaN |
| 47 | 2009-12-01 00:00:00+00:00 | NaN |
| 48 | 2010-01-01 00:00:00+00:00 | NaN |
| 49 | 2010-02-01 00:00:00+00:00 | NaN |
| 50 | 2010-03-01 00:00:00+00:00 | NaN |
| 51 | 2010-04-01 00:00:00+00:00 | 2.0 |
| 52 | 2010-05-01 00:00:00+00:00 | 1.0 |
| 53 | 2010-06-01 00:00:00+00:00 | NaN |
| 54 | 2010-07-01 00:00:00+00:00 | 1.0 |
| 55 | 2010-08-01 00:00:00+00:00 | 1.0 |
| 56 | 2010-09-01 00:00:00+00:00 | 1.0 |
| 57 | 2010-10-01 00:00:00+00:00 | NaN |
| 58 | 2010-11-01 00:00:00+00:00 | NaN |
| 59 | 2010-12-01 00:00:00+00:00 | NaN |
| 60 | 2011-01-01 00:00:00+00:00 | 4.0 |
| 61 | 2011-02-01 00:00:00+00:00 | NaN |
| 62 | 2011-03-01 00:00:00+00:00 | 1.0 |
| 63 | 2011-04-01 00:00:00+00:00 | 4.0 |
| 64 | 2011-05-01 00:00:00+00:00 | 2.0 |
| 65 | 2011-06-01 00:00:00+00:00 | 1.0 |
| 66 | 2011-07-01 00:00:00+00:00 | 4.0 |
| 67 | 2011-08-01 00:00:00+00:00 | 2.0 |
| 68 | 2011-09-01 00:00:00+00:00 | 1.0 |
| 69 | 2011-10-01 00:00:00+00:00 | 1.0 |
| 70 | 2011-11-01 00:00:00+00:00 | 4.0 |
| 71 | 2011-12-01 00:00:00+00:00 | NaN |
| 72 | 2012-01-01 00:00:00+00:00 | 5.0 |
| 73 | 2012-02-01 00:00:00+00:00 | 5.0 |
| 74 | 2012-03-01 00:00:00+00:00 | 2.0 |
| 75 | 2012-04-01 00:00:00+00:00 | 4.0 |
| 76 | 2012-05-01 00:00:00+00:00 | 3.0 |
| 77 | 2012-06-01 00:00:00+00:00 | NaN |
| 78 | 2012-07-01 00:00:00+00:00 | 4.0 |
| 79 | 2012-08-01 00:00:00+00:00 | 2.0 |
| 80 | 2012-09-01 00:00:00+00:00 | 1.0 |
| 81 | 2012-10-01 00:00:00+00:00 | 1.0 |
| 82 | 2012-11-01 00:00:00+00:00 | 2.0 |
| 83 | 2012-12-01 00:00:00+00:00 | NaN |
| 84 | 2013-01-01 00:00:00+00:00 | 3.0 |
| 85 | 2013-02-01 00:00:00+00:00 | 1.0 |
| 86 | 2013-03-01 00:00:00+00:00 | NaN |
| 87 | 2013-04-01 00:00:00+00:00 | 2.0 |
| 88 | 2013-05-01 00:00:00+00:00 | 2.0 |
| 89 | 2013-06-01 00:00:00+00:00 | NaN |
| 90 | 2013-07-01 00:00:00+00:00 | 8.0 |
| 91 | 2013-08-01 00:00:00+00:00 | 6.0 |
| 92 | 2013-09-01 00:00:00+00:00 | 8.0 |
| 93 | 2013-10-01 00:00:00+00:00 | NaN |
| 94 | 2013-11-01 00:00:00+00:00 | 7.0 |
| 95 | 2013-12-01 00:00:00+00:00 | NaN |
| 96 | 2014-01-01 00:00:00+00:00 | 6.0 |
| 97 | 2014-02-01 00:00:00+00:00 | 5.0 |
| 98 | 2014-03-01 00:00:00+00:00 | 2.0 |
| 99 | 2014-04-01 00:00:00+00:00 | NaN |
| 100 | 2014-05-01 00:00:00+00:00 | 10.0 |
| 101 | 2014-06-01 00:00:00+00:00 | NaN |
| 102 | 2014-07-01 00:00:00+00:00 | 7.0 |
| 103 | 2014-08-01 00:00:00+00:00 | 3.0 |
| 104 | 2014-09-01 00:00:00+00:00 | 13.0 |
| 105 | 2014-10-01 00:00:00+00:00 | NaN |
| 106 | 2014-11-01 00:00:00+00:00 | 8.0 |
| 107 | 2014-12-01 00:00:00+00:00 | NaN |
| 108 | 2015-01-01 00:00:00+00:00 | 8.0 |
| 109 | 2015-02-01 00:00:00+00:00 | 6.0 |
| 110 | 2015-03-01 00:00:00+00:00 | 11.0 |
| 111 | 2015-04-01 00:00:00+00:00 | NaN |
| 112 | 2015-05-01 00:00:00+00:00 | 2.0 |
| 113 | 2015-06-01 00:00:00+00:00 | 2.0 |
| 114 | 2015-07-01 00:00:00+00:00 | 1.0 |
| 115 | 2015-08-01 00:00:00+00:00 | NaN |
| 116 | 2015-09-01 00:00:00+00:00 | NaN |
| 117 | 2015-10-01 00:00:00+00:00 | NaN |
| 118 | 2015-11-01 00:00:00+00:00 | NaN |
| 119 | 2015-12-01 00:00:00+00:00 | NaN |
| 120 | 2016-01-01 00:00:00+00:00 | 2.0 |
| 121 | 2016-02-01 00:00:00+00:00 | NaN |
| 122 | 2016-03-01 00:00:00+00:00 | NaN |
| 123 | 2016-04-01 00:00:00+00:00 | NaN |
| 124 | 2016-05-01 00:00:00+00:00 | 1.0 |
| 125 | 2016-06-01 00:00:00+00:00 | 3.0 |
| 126 | 2016-07-01 00:00:00+00:00 | 5.0 |
| 127 | 2016-08-01 00:00:00+00:00 | NaN |
| 128 | 2016-09-01 00:00:00+00:00 | 1.0 |
| 129 | 2016-10-01 00:00:00+00:00 | 2.0 |
| 130 | 2016-11-01 00:00:00+00:00 | NaN |
| 131 | 2016-12-01 00:00:00+00:00 | NaN |
| 132 | 2017-01-01 00:00:00+00:00 | 1.0 |
| 133 | 2017-02-01 00:00:00+00:00 | 3.0 |
| 134 | 2017-03-01 00:00:00+00:00 | NaN |
| 135 | 2017-04-01 00:00:00+00:00 | 2.0 |
# Создадим график распределения сотрудников по месяцу и году найма в компанию
g=sns.relplot(data=df_date_hire,
x="month",
y="employee_count",
height=4,
aspect=3.5,
color="indianred",
alpha=0.9,
ci=False,
kind="line"
)
# Определим шкалу как временной диапазон в формате Pandas с квартальными интервалами (для читаемости):
scale_span = pd.date_range(start=df_date_hire['month'].min(),
end= df_date_hire['month'].max(),
freq='QS').tolist()
plt.xticks(ticks=scale_span, fontsize=12, rotation=90) # установим шкалу X и размер её обозначений
plt.yticks(fontsize=14) # Установим размер обозначения для шкалы Y
# Определим формат подписей шкалы как YYYY-MM
g.axes[0,0].xaxis.set_major_formatter(mpl.dates.DateFormatter('%Y-%m'))
g.set_xlabels("Months", fontsize=14) # Размер подписей шкалы X
g.set_ylabels("Employees hired", fontsize=14) # Размер подписей шкалы Y
# Заголовок графика:
g.fig.suptitle("Общее распеределение действующих сотрудников по месяцу и году найма", fontsize=16, y=1.025)
plt.show()
ВЫВОД
# Выберем истоники найма из таблицы hr_dataset
sql_quiery = \
"""
(SELECT
COALESCE("Employee Source", '[TOTAL]') AS employee_source,
COUNT("Employee Number") AS number_of_employees
FROM
hr_dataset
WHERE
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
GROUP BY ROLLUP
("Employee Source")
ORDER BY
number_of_employees DESC
)
;
"""
df_empl_source = pd.read_sql(sql_quiery, conn)
df_empl_source
| employee_source | number_of_employees | |
|---|---|---|
| 0 | [TOTAL] | 208 |
| 1 | Employee Referral | 27 |
| 2 | Pay Per Click - Google | 18 |
| 3 | Professional Society | 17 |
| 4 | Diversity Job Fair | 13 |
| 5 | MBTA ads | 13 |
| 6 | Newspager/Magazine | 13 |
| 7 | Monster.com | 13 |
| 8 | Website Banner Ads | 12 |
| 9 | On-campus Recruiting | 11 |
| 10 | Vendor Referral | 11 |
| 11 | Billboard | 11 |
| 12 | Search Engine - Google Bing Yahoo | 10 |
| 13 | Glassdoor | 8 |
| 14 | Indeed | 8 |
| 15 | Word of Mouth | 6 |
| 16 | Other | 6 |
| 17 | Internet Search | 4 |
| 18 | Information Session | 3 |
| 19 | Social Networks - Facebook Twitter etc | 3 |
| 20 | Careerbuilder | 1 |
# Построим круговую диаграммы
# Определим данные для построения
dfg_empl_source = df_empl_source[1:]
# Определим поле для диаграмм
fig, ax = plt.subplots(figsize=(16, 7), ncols=1, nrows=1)
fig.suptitle("Распеределение действующих сотрудников по источнику найма", fontsize = 16, y=1.125)
#cmap = plt.colormaps["Paired"] # Выберем цветовую палитру (категориальную)
#my_colors = cmap(np.arange(6)*1) # Определим цвета для графика np.arrange(число сегментов)*коэффициент
cmap = plt.colormaps["terrain"] # Выберем цветовую палитру (последовательную)
my_colors =cmap(np.linspace(0.125, 0.875, dfg_empl_source.shape[0]))
ax.pie(dfg_empl_source["number_of_employees"],
labels = dfg_empl_source["employee_source"],
autopct='%1.1f%%',
textprops={'fontsize':12},
pctdistance=0.9,
radius=1.5,
colors=my_colors
)
plt.show()
ВЫВОД
Замечание
К сожалению, данные таблицы recruiting_costs не увязаны с каким-либо периодом времени, из-за чего их абсолютные значения практически невозможно сопоставить с данными о персонале. ЕСЛИ ПРЕДПОЛОЖИТЬ, что эти данные являются накопительными за тот же период, который охватывается hr_dataset, то можно УСЛОВНО рассчитать стоимость найма одного сотрудника. Такой расчёт можно сделать, его результат будет приведен ниже. Еще раз подчеркну, полученные выводы могли бы быть справедливыми, если принять, что данные recruiting_costs относятся к тому же периоду времени, что и hr_dataset. В любом случае, эти данные могут дать лишь косвенное представление об относительной дороговизне того или иного источнка найма.
sql_quiery = \
"""
WITH
empl_source_count AS
(SELECT
"Employee Source" AS empl_source,
COUNT("Employee Number")
FROM hr_dataset
GROUP BY
"Employee Source"
),
empl_source_price AS
(SELECT
"Employment Source" AS empl_source,
"Total"
FROM
recruiting_costs
),
empl_count_per_source_price AS
(SELECT
*
FROM
(empl_source_count
LEFT JOIN
empl_source_price
USING (empl_source)
)
)
SELECT
empl_source AS "Recruitment Source",
count AS "Employee Count",
"Total" AS "Total per Source",
ROUND("Total" / (SUM(count) OVER (PARTITION BY empl_source)), 2) AS "Per Employee"
FROM
empl_count_per_source_price
ORDER BY
"Per Employee" DESC
"""
df_recruiting_costs_per_employee = pd.read_sql(sql_quiery, conn)
df_recruiting_costs_per_employee
| Recruitment Source | Employee Count | Total per Source | Per Employee | |
|---|---|---|---|---|
| 0 | Indeed | 8 | NaN | NaN |
| 1 | Careerbuilder | 1 | 7790.0 | 7790.00 |
| 2 | Pay Per Click | 1 | 1323.0 | 1323.00 |
| 3 | MBTA ads | 17 | 10980.0 | 645.88 |
| 4 | On-campus Recruiting | 12 | 7500.0 | 625.00 |
| 5 | Website Banner Ads | 13 | 7143.0 | 549.46 |
| 6 | Social Networks - Facebook Twitter etc | 11 | 5573.0 | 506.64 |
| 7 | Newspager/Magazine | 18 | 8291.0 | 460.61 |
| 8 | Other | 9 | 3995.0 | 443.89 |
| 9 | Billboard | 16 | 6192.0 | 387.00 |
| 10 | Diversity Job Fair | 29 | 10021.0 | 345.55 |
| 11 | Monster.com | 24 | 5760.0 | 240.00 |
| 12 | Search Engine - Google Bing Yahoo | 25 | 5183.0 | 207.32 |
| 13 | Pay Per Click - Google | 21 | 3509.0 | 167.10 |
| 14 | Professional Society | 20 | 1200.0 | 60.00 |
| 15 | Employee Referral | 31 | 0.0 | 0.00 |
| 16 | On-line Web application | 1 | 0.0 | 0.00 |
| 17 | Information Session | 4 | 0.0 | 0.00 |
| 18 | Internet Search | 6 | 0.0 | 0.00 |
| 19 | Vendor Referral | 15 | 0.0 | 0.00 |
| 20 | Company Intranet - Partner | 1 | 0.0 | 0.00 |
| 21 | Glassdoor | 14 | 0.0 | 0.00 |
| 22 | Word of Mouth | 13 | 0.0 | 0.00 |
dfg_recruiting_costs_per_employee = df_recruiting_costs_per_employee.dropna()
labels=dfg_recruiting_costs_per_employee["Recruitment Source"].tolist()
# Установим параметры matplotlib для графиков
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(15, 5), sharex=True)
fig.suptitle("Источники найма и стоимость найма в расчёте на сотрудника (USD)", fontsize=16, y=1.05)
# Выведем график количества нанятых сотрудников в зависимости от истоичника найма
sns.barplot(x="Recruitment Source",
y="Employee Count",
data=dfg_recruiting_costs_per_employee,
palette="cool",
ax=ax1)
ax1.set_ylabel("Employee Count")
ax1.set_xlabel(None) # уберём подпись под осью X для графика ax1
# Выведем график стоимости найма одного сотрудника в зависимости от истоичника найма
sns.barplot(x="Recruitment Source",
y="Per Employee",
data=dfg_recruiting_costs_per_employee,
palette="winter",
ax=ax2)
ax2.set_ylabel("USD per Employee")
ax2.set_yscale('log') # для читаемости установим логарифмическую шкалу
ax2.set_xticklabels(labels = labels, rotation=60,fontsize=10) # установим шкалу X c поворотом и размер её обозначений
plt.show()
ВЫВОД
# Создадим DF для исследования возрастного распределения сотрудников компаниии
sql_quiery = \
"""
SELECT
"Employee Number",
age
FROM
hr_dataset
WHERE -- Условие, что работники действующие
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
ORDER BY
age
;
"""
df_age = pd.read_sql(sql_quiery, conn)
df_age
| Employee Number | age | |
|---|---|---|
| 0 | 1408069539 | 25 |
| 1 | 1404066711 | 25 |
| 2 | 1103024924 | 26 |
| 3 | 1312063714 | 27 |
| 4 | 1501072192 | 27 |
| 5 | 1408069882 | 27 |
| 6 | 1403066125 | 27 |
| 7 | 1402065303 | 28 |
| 8 | 1102024173 | 28 |
| 9 | 1111030503 | 28 |
| 10 | 1308060366 | 28 |
| 11 | 1501072180 | 28 |
| 12 | 1009920000 | 28 |
| 13 | 1105025661 | 28 |
| 14 | 1203032099 | 28 |
| 15 | 1302053339 | 28 |
| 16 | 1409070567 | 29 |
| 17 | 1011022883 | 29 |
| 18 | 1411071295 | 29 |
| 19 | 1102024106 | 29 |
| 20 | 1111030684 | 29 |
| 21 | 1412071660 | 29 |
| 22 | 1406068403 | 29 |
| 23 | 1306059197 | 29 |
| 24 | 1101023540 | 29 |
| 25 | 1307059937 | 29 |
| 26 | 1302053362 | 29 |
| 27 | 602000312 | 29 |
| 28 | 1103024456 | 30 |
| 29 | 1307059817 | 30 |
| 30 | 1212052023 | 30 |
| 31 | 1012023013 | 30 |
| 32 | 1012023295 | 30 |
| 33 | 1402065085 | 30 |
| 34 | 1303054625 | 30 |
| 35 | 1009919930 | 30 |
| 36 | 1009919990 | 30 |
| 37 | 909015167 | 31 |
| 38 | 1408069481 | 31 |
| 39 | 1003018246 | 31 |
| 40 | 1304055987 | 31 |
| 41 | 803009012 | 31 |
| 42 | 1011022863 | 31 |
| 43 | 1012023185 | 31 |
| 44 | 1202031618 | 31 |
| 45 | 1101023577 | 31 |
| 46 | 1311063172 | 31 |
| 47 | 1302053333 | 31 |
| 48 | 1209049259 | 31 |
| 49 | 1008020942 | 31 |
| 50 | 1406067957 | 31 |
| 51 | 1307060188 | 31 |
| 52 | 1203032255 | 31 |
| 53 | 1010022337 | 31 |
| 54 | 1106026462 | 32 |
| 55 | 1301052902 | 32 |
| 56 | 1211050782 | 32 |
| 57 | 1108028108 | 32 |
| 58 | 1104025435 | 32 |
| 59 | 1001549006 | 32 |
| 60 | 1205033102 | 32 |
| 61 | 1309061015 | 32 |
| 62 | 1106026474 | 33 |
| 63 | 1209049326 | 33 |
| 64 | 1106026896 | 33 |
| 65 | 1499902910 | 33 |
| 66 | 1203032357 | 33 |
| 67 | 1102024115 | 33 |
| 68 | 1108027853 | 33 |
| 69 | 1102024121 | 33 |
| 70 | 1012023010 | 33 |
| 71 | 1205033180 | 33 |
| 72 | 1011022887 | 33 |
| 73 | 1101023679 | 34 |
| 74 | 1105025718 | 34 |
| 75 | 1111030129 | 34 |
| 76 | 1409070522 | 34 |
| 77 | 1106026433 | 34 |
| 78 | 1108028351 | 34 |
| 79 | 1101023457 | 34 |
| 80 | 1106026572 | 34 |
| 81 | 1009919980 | 34 |
| 82 | 1105026041 | 34 |
| 83 | 1110029990 | 34 |
| 84 | 1002017900 | 34 |
| 85 | 904013591 | 35 |
| 86 | 1006020066 | 35 |
| 87 | 1301052124 | 35 |
| 88 | 1311063114 | 35 |
| 89 | 706006285 | 35 |
| 90 | 1406068241 | 35 |
| 91 | 1111030244 | 35 |
| 92 | 1201031310 | 35 |
| 93 | 1012023152 | 35 |
| 94 | 1201031324 | 35 |
| 95 | 812011761 | 36 |
| 96 | 1103024679 | 36 |
| 97 | 1009919970 | 36 |
| 98 | 1407068885 | 36 |
| 99 | 1107027351 | 36 |
| 100 | 1001504432 | 36 |
| 101 | 1307060077 | 36 |
| 102 | 1001735072 | 36 |
| 103 | 906014183 | 37 |
| 104 | 1407069280 | 37 |
| 105 | 1988299991 | 37 |
| 106 | 1192991000 | 37 |
| 107 | 1006020020 | 37 |
| 108 | 1112030816 | 38 |
| 109 | 1104025008 | 38 |
| 110 | 1311062610 | 38 |
| 111 | 1201031274 | 38 |
| 112 | 1203032498 | 38 |
| 113 | 1204032927 | 38 |
| 114 | 1104025414 | 38 |
| 115 | 1009919940 | 38 |
| 116 | 1103024335 | 38 |
| 117 | 1011022820 | 38 |
| 118 | 1209048696 | 39 |
| 119 | 1106026579 | 39 |
| 120 | 1107027358 | 39 |
| 121 | 1001956578 | 39 |
| 122 | 1201031308 | 39 |
| 123 | 1110029777 | 39 |
| 124 | 1101023612 | 39 |
| 125 | 1103024843 | 39 |
| 126 | 1305057282 | 39 |
| 127 | 1405067565 | 39 |
| 128 | 1110029732 | 39 |
| 129 | 1211051232 | 39 |
| 130 | 1001109612 | 39 |
| 131 | 1108028428 | 39 |
| 132 | 1301052449 | 40 |
| 133 | 1401064327 | 40 |
| 134 | 1301052347 | 40 |
| 135 | 1405067298 | 40 |
| 136 | 1599991009 | 40 |
| 137 | 1304055683 | 40 |
| 138 | 1302053044 | 41 |
| 139 | 1109029256 | 41 |
| 140 | 1404066949 | 41 |
| 141 | 1403066069 | 41 |
| 142 | 1110029623 | 41 |
| 143 | 1408069503 | 41 |
| 144 | 1304055947 | 41 |
| 145 | 1405067064 | 41 |
| 146 | 1011022818 | 42 |
| 147 | 1102024149 | 42 |
| 148 | 1000974650 | 42 |
| 149 | 1306057978 | 42 |
| 150 | 1101023353 | 42 |
| 151 | 1301052462 | 42 |
| 152 | 1504073313 | 42 |
| 153 | 807010161 | 43 |
| 154 | 1301052436 | 43 |
| 155 | 1308060754 | 43 |
| 156 | 710007555 | 43 |
| 157 | 1312063675 | 43 |
| 158 | 1404066622 | 44 |
| 159 | 1409070147 | 44 |
| 160 | 1007020403 | 44 |
| 161 | 1307059944 | 44 |
| 162 | 1001084890 | 44 |
| 163 | 1106026933 | 45 |
| 164 | 1009919950 | 45 |
| 165 | 1405067642 | 45 |
| 166 | 1203032263 | 45 |
| 167 | 1201031438 | 45 |
| 168 | 1009919920 | 46 |
| 169 | 1403066194 | 47 |
| 170 | 1209048771 | 47 |
| 171 | 1101023754 | 47 |
| 172 | 1109029366 | 47 |
| 173 | 1001103149 | 48 |
| 174 | 808010278 | 48 |
| 175 | 1408069635 | 48 |
| 176 | 1212051409 | 48 |
| 177 | 1411071212 | 48 |
| 178 | 1009919960 | 48 |
| 179 | 1306057810 | 48 |
| 180 | 1501072311 | 49 |
| 181 | 1407069061 | 49 |
| 182 | 1411071506 | 49 |
| 183 | 1410071137 | 49 |
| 184 | 1204032843 | 49 |
| 185 | 1501072093 | 49 |
| 186 | 1410070998 | 50 |
| 187 | 1502072711 | 50 |
| 188 | 1001644719 | 51 |
| 189 | 1403066020 | 51 |
| 190 | 1501072124 | 51 |
| 191 | 1409070255 | 51 |
| 192 | 1308060959 | 52 |
| 193 | 1009021646 | 52 |
| 194 | 1303054329 | 52 |
| 195 | 1503072857 | 52 |
| 196 | 1401064637 | 53 |
| 197 | 1501071909 | 53 |
| 198 | 1412071713 | 54 |
| 199 | 1411071302 | 54 |
| 200 | 1001970770 | 54 |
| 201 | 1403065721 | 55 |
| 202 | 1401064562 | 56 |
| 203 | 1412071844 | 59 |
| 204 | 1001495124 | 63 |
| 205 | 1411071481 | 63 |
| 206 | 1411071312 | 66 |
| 207 | 1208048062 | 67 |
# Создадим график распределения сотрудников по возрасту
g=sns.displot(data=df_age,
x="age",
bins=42, # ширина 1 bin-а = 1 год
binwidth=1,
height=10,
aspect=2,
color="coral",
alpha=0.8
)
# Определяем шкалу X:
scale_step = 5 # Определяем шаг шкалы X - его можно менять!
# Определяем основной диапазон шкалы
scale_span = list(range((int(df_age['age'].min() / scale_step) * scale_step),
int(df_age['age'].max()),
scale_step))
# При неоходимости дополняем первый и послдений элементы к шкале (если они не вошли в шкалу ранее)
if df_age['age'].min() < scale_span[0]:
scale_span = ([df_age['age'].min()] + scale_span)
if df_age['age'].max() > scale_span[-1]:
scale_span = scale_span + [df_age['age'].max()]
plt.xticks(ticks=scale_span, fontsize=14) # установим шкалу X и размер её обозначений
plt.yticks(fontsize=14) # Установим размер обозначения для шкалы Y
g.set_xlabels(fontsize=16) # Размер подписей шкалы X
g.set_ylabels(fontsize=16) # Размер подписей шкалы Y
# Заголовок графика:
g.fig.suptitle("Общее возрастное распеределение сотрудников", fontsize=20, y=1.0125)
# Добавим вертикальные линии медианы и средней арифметической возраста
# Для этого определим функцию построения вертикальной линии и её подписи
def age_lines(x, **kwargs):
# Построение линии медианы на основе принимаемых значений x, цвет, толщина, стиль
plt.axvline(x.median(), color='darkslategrey', linewidth=3, linestyle=':')
# Построение линии средней на основе принимаемых значений x, цвет, толщина, стиль
plt.axvline(x.mean(), color='navy', linewidth=3, linestyle='-.')
# Создание подписи к линии медианы
plt.annotate(
text=f"медиана {x.median()}", # Аннотация линии медианы.
xy=(x.median()-0.275, 16), # Положение подписи в единицах шкал графика
horizontalalignment='center', # Выравнивание текста по горизонтали
verticalalignment='top', # Выравнивание текста по вертикали
rotation="vertical", # Поворот подписи
color='darkslategrey', # Цвет надписи
alpha=1,
fontsize=16
)
plt.annotate(
text=f"средняя {x.mean():,.1f}", # Аннотация линии медианы.
xy=(x.mean()-0.275, 16), # Положение подписи в единицах шкал графика
horizontalalignment='center', # Выравнивание текста по горизонтали
verticalalignment='top', # Выравнивание текста по вертикали
rotation="vertical", # Поворот подписи
color='navy', # Цвет надписи
alpha=1,
fontsize=16
)
# Определяем "разметку" для исполнения функции построения линий
# Передаём функции величины возрастов
g.map(age_lines, x=df_age['age'])
plt.show()
ВЫВОД
# Выберем половой признак из таблицы hr_dataset и подсчитаем количество работников
sql_quiery = \
"""
(SELECT
sex,
COUNT("Employee Number") AS "Employee Count"
FROM
hr_dataset
WHERE -- Условие, что работники действующие
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
GROUP BY
sex
ORDER BY
sex
)
UNION ALL
(SELECT
'TOTALS',
COUNT("Employee Number") AS "Employee Count"
FROM
hr_dataset
WHERE -- Условие, что работники действующие
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
)
;
"""
df_empsex = pd.read_sql(sql_quiery, conn)
df_empsex
| sex | Employee Count | |
|---|---|---|
| 0 | Female | 118 |
| 1 | Male | 90 |
| 2 | TOTALS | 208 |
# Построим круговую диаграммы
# Определим данные для построения
dfg_empsex = df_empsex[:-1]
# Определим поле для диаграмм
fig, ax = plt.subplots(figsize=(16, 6), ncols=1, nrows=1)
fig.suptitle("Распределение действующих сотрудников по половому признаку", fontsize = 16, y=0.9)
cmap = plt.colormaps["Pastel1"] # Выберем цветовую палитру
my_colors = cmap(np.arange(2)*3) # Определим цвета для графика np.arrange(число сегментов)*коэффициент
ax.pie(dfg_empsex["Employee Count"],
labels = dfg_empsex["sex"],
autopct='%1.1f%%',
textprops={'fontsize':14},
pctdistance=0.8,
colors=my_colors
)
plt.show()
ВЫВОД
Действующий коллектив сотрудников компании преимущественно женский: почти 60% сотрудников - женщины.
# Выберем расово-этническую принадлженость сотрудников из таблицы hr_dataset
sql_quiery = \
"""
(SELECT
COALESCE(racedesc, '[TOTAL]') AS "Race",
COUNT("Employee Number") AS number_of_employees
FROM
hr_dataset
WHERE
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
GROUP BY ROLLUP
(racedesc)
ORDER BY
racedesc
)
;
"""
df_races = pd.read_sql(sql_quiery, conn)
df_races
| Race | number_of_employees | |
|---|---|---|
| 0 | American Indian or Alaska Native | 4 |
| 1 | Asian | 23 |
| 2 | Black or African American | 40 |
| 3 | Hispanic | 3 |
| 4 | Two or more races | 11 |
| 5 | White | 127 |
| 6 | [TOTAL] | 208 |
# Построим круговую диаграммы
# Определим данные для построения
dfg_races = df_races[:-1]
# Определим поле для диаграмм
fig, ax = plt.subplots(figsize=(16, 7), ncols=1, nrows=1)
fig.suptitle(" Распеределение действующих сотрудников по расово-этнической принадлежности", fontsize = 16, y=0.825)
cmap = plt.colormaps["Dark2"] # Выберем цветовую палитру (категориальную)
my_colors = cmap(np.arange(6)*1) # Определим цвета для графика np.arrange(число сегментов)*коэффициент
#cmap = plt.colormaps["summer"] # Выберем цветовую палитру (последовательную)
#my_colors =cmap(np.linspace(0.2, 0.8, dfg_races.shape[0]))
ax.pie(dfg_races["number_of_employees"],
labels = dfg_races["Race"],
autopct='%1.1f%%',
textprops={'fontsize':12},
pctdistance=0.8,
radius=0.8,
colors=my_colors
)
plt.show()
ВЫВОД
ПРИМЕЧАНИЕ
Имеющаяся в hr_dataset классификация перемешивает расы и этносы и не соответствует ни современным, ни историческим классификациям. Ближе всего она к географической классификации этнических групп, но и ей она тоже не соответствует.
# Выберем варианты семейного положения сотрудников из таблицы hr_dataset
sql_quiery = \
"""
(SELECT
COALESCE(maritaldesc, '[TOTAL]') AS "marital_status",
COUNT("Employee Number") AS number_of_employees
FROM
hr_dataset
WHERE
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
GROUP BY ROLLUP
(maritaldesc)
ORDER BY
maritaldesc
)
;
"""
df_marital = pd.read_sql(sql_quiery, conn)
df_marital
| marital_status | number_of_employees | |
|---|---|---|
| 0 | Divorced | 14 |
| 1 | Married | 78 |
| 2 | Separated | 11 |
| 3 | Single | 101 |
| 4 | Widowed | 4 |
| 5 | [TOTAL] | 208 |
# Построим круговую диаграммы
# Определим данные для построения
dfg_marital = df_marital[:-1]
# Определим поле для диаграмм
fig, ax = plt.subplots(figsize=(16, 7), ncols=1, nrows=1)
fig.suptitle(" Распеределение действующих сотрудников по семейному положению", fontsize = 16, y=0.825)
cmap = plt.colormaps["Paired"] # Выберем цветовую палитру (категориальную)
my_colors = cmap(np.arange(6)*2) # Определим цвета для графика np.arrange(число сегментов)*коэффициент
#cmap = plt.colormaps["summer"] # Выберем цветовую палитру (последовательную)
#my_colors =cmap(np.linspace(0.2, 0.8, dfg_races.shape[0]))
ax.pie(dfg_marital["number_of_employees"],
labels = dfg_marital["marital_status"],
autopct='%1.1f%%',
textprops={'fontsize':12},
pctdistance=0.8,
radius=0.8,
colors=my_colors
)
plt.show()
ВЫВОД
# Выберем значения для параметра гражданства из таблицы hr_dataset
sql_quiery = \
"""
(SELECT
COALESCE(citizendesc, '[TOTAL]') AS citizenship,
COUNT("Employee Number") AS number_of_employees
FROM
hr_dataset
WHERE
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
GROUP BY ROLLUP
(citizendesc)
ORDER BY
citizendesc
)
;
"""
df_citizen = pd.read_sql(sql_quiery, conn)
df_citizen
| citizenship | number_of_employees | |
|---|---|---|
| 0 | Eligible NonCitizen | 7 |
| 1 | Non-Citizen | 1 |
| 2 | US Citizen | 200 |
| 3 | [TOTAL] | 208 |
ВЫВОД
# Выберем штаты проживания сотрудников из таблицы hr_dataset
sql_quiery = \
"""
(SELECT
COALESCE(state, '[TOTAL]') AS "state",
COUNT("Employee Number") AS number_of_employees,
ROUND(
-- Приведём полученные значения одсчёта "Employee Number" к числовому значения с точностью 9 знаков и
-- 2 знаками после запятой
CAST(
COUNT("Employee Number")
AS numeric(9, 2)) /
-- Результат выведем так же с 2 знаками после запятой
(SELECT COUNT("Employee Number") FROM hr_dataset
WHERE
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence') * 100, 2)
AS percent_of_all
FROM
hr_dataset
WHERE
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
GROUP BY ROLLUP
(state)
ORDER BY
number_of_employees DESC
)
;
"""
df_state = pd.read_sql(sql_quiery, conn)
df_state
| state | number_of_employees | percent_of_all | |
|---|---|---|---|
| 0 | [TOTAL] | 208 | 100.00 |
| 1 | MA | 178 | 85.58 |
| 2 | CT | 5 | 2.40 |
| 3 | TX | 3 | 1.44 |
| 4 | VT | 2 | 0.96 |
| 5 | MT | 1 | 0.48 |
| 6 | NV | 1 | 0.48 |
| 7 | KY | 1 | 0.48 |
| 8 | NY | 1 | 0.48 |
| 9 | WA | 1 | 0.48 |
| 10 | AL | 1 | 0.48 |
| 11 | IN | 1 | 0.48 |
| 12 | UT | 1 | 0.48 |
| 13 | FL | 1 | 0.48 |
| 14 | CO | 1 | 0.48 |
| 15 | ME | 1 | 0.48 |
| 16 | NC | 1 | 0.48 |
| 17 | AZ | 1 | 0.48 |
| 18 | RI | 1 | 0.48 |
| 19 | GA | 1 | 0.48 |
| 20 | ID | 1 | 0.48 |
| 21 | CA | 1 | 0.48 |
| 22 | NH | 1 | 0.48 |
| 23 | OR | 1 | 0.48 |
| 24 | ND | 1 | 0.48 |
ВЫВОД
Эта зависимость не виляет непосредственно на производственный процесс, но может быть интересной с точки зрения описания действующего штата компании.
sql_quiery = \
"""
SELECT
sex,
age,
maritaldesc
FROM
hr_dataset
WHERE -- Условие, что работники действующие
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
ORDER BY
maritaldesc
;
"""
dfg_age_sex_marital_dependance = pd.read_sql(sql_quiery, conn)
# dfg_age_sex_marital_dependance
# Выведем фасеточный график Распределение сотруников по возрасту, полу и семейному положению
g=sns.displot(
dfg_age_sex_marital_dependance,
x="age",
col="sex",
row="maritaldesc",
hue="maritaldesc",
binwidth=1,
bins=42,
height=3,
aspect = 1.8,
row_order=["Single","Married", "Separated", "Divorced", "Widowed"],
facet_kws=dict(margin_titles=True, sharex=False),
)
# Определяем шкалу:
scale_step = 5 # Определяем шаг шкалы X - его можно менять!
# Определяем основной диапазон шкалы
scale_span = list(range((int(dfg_age_sex_marital_dependance['age'].min() / scale_step) * scale_step),
int(dfg_age_sex_marital_dependance['age'].max()),
scale_step))
# При неоходимости дополняем первый и послдений элементы к шкале (если они не вошли в шкалу ранее)
if dfg_age_sex_marital_dependance['age'].min() < scale_span[0]:
scale_span = ([dfg_age_sex_marital_dependance['age'].min()] + scale_span)
if dfg_age_sex_marital_dependance['age'].max() > scale_span[-1]:
scale_span = scale_span + [dfg_age_sex_marital_dependance['age'].max()]
g.set(xticks=scale_span)
g.fig.suptitle("Распределение сотруников по возрасту, полу и семейному положению", fontsize=16, x=0.45, y=1.03)
# Определим размер подписи оси X
g.set_xlabels(fontsize=10)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=10)
plt.show()
ВЫВОД
При описании штата компании были выявлены следующие параметры, взаимосвязи которых можно было бы исследовать.
Гипотетически можно было бы исследовать любые взаимозависимости. Однако для задач исследования интересны только те зависимости, которые прямо или косвенно влияют на результаты деятельности компании. Внутренние зависимость социально-демографических показателей могут быть интересны лишь в той мере, в какой они описывают действующий штат компании.
Главное - это производственно-экономические показатели. Они зависят от показателей структуры и от социально-демографических показателей, а также от других производственно-экономических показателей. Социально-демографические показатели тоже зависят от показателей структуры (в плане распределения по показателям структуры). Показатели структуры в свою очередь зависят от других структурных показателей. Рассмотрим их в обратном порядке.
Поэтому, исходя из задач исследования, будут интересовать зависимости для следующих показателей:
1) для показателей структуры - от других показателей структуры
2) для социально-демографических показателей - от показателей структуры
3) для производственно-экономических показателей от показателей структуры, социально-демографических и других производственно-экономических показателей.
ПРИМЕЧАНИЕ
- Когда речь идёт о зависимости одного показателя от другого, подразумевается, что существует некая функция y=f(x), которая связывает эти показатели. То есть, если рассматриваем зависимость статусов занятости от департаментов, это будет означать y="Статус занятости"("Департамент"). Как будет показано в п. 1.1.2.1. для измерения в количестве сотрудников, например, для статуса занятости "Active" и департамента "Production" функция примет вид y="Status Active"("Production"), и здесь y=106.
- Распределение сотрудников по структуре компании неравномерно. Более того, функции сотрудников в разных подразделениях и на разных должностях тоже разные. Поэтому использование только количественных показателей и рассмотрение их зависимостей на всей совокупности штата компании может оказаться неуместным и будет приводить к ошибкам выводам. Для этого необходимо нормировать показатели и рассматривать их внутригрупповое распределение, для чего будет использовано процентное исчисление.
Выше, в п. 1.1.1. была рассмотрена действующая структура компании, то есть структура, включающая в себя работающих сотрудников, сотрудников, которые находятся в отпуске, и сотрудники, которые уже приняты на штатные должности, но еще не успели приступить к работе. Среди нерассмотренных взаимосвязей показателей структуры осталась только связь статусов занятости со структурой компании.
# Выберем статусы занятости и количество сотрудников.
# Подсчитаем количество сотрудников в группах
sql_quiery = \
"""
WITH
status_dptmnt AS
(SELECT
"Employment Status",
department,
COUNT("Employee Number") AS employee_count
FROM
hr_dataset
GROUP BY
department,
"Employment Status"
ORDER BY
department,
"Employment Status"
)
SELECT
"Employment Status",
department,
employee_count,
ROUND(
employee_count / (SUM(employee_count) OVER (PARTITION BY department))*100, 2)
AS percent_of_department
FROM
status_dptmnt
ORDER BY
"Employment Status",
department
;"""
df_empl_status_over_dptmnt = pd.read_sql(sql_quiery, conn)
df_empl_status_over_dptmnt
| Employment Status | department | employee_count | percent_of_department | |
|---|---|---|---|---|
| 0 | Active | Admin Offices | 8 | 80.00 |
| 1 | Active | Executive Office | 1 | 100.00 |
| 2 | Active | IT/IS | 35 | 70.00 |
| 3 | Active | Production | 106 | 50.96 |
| 4 | Active | Sales | 26 | 83.87 |
| 5 | Active | Software Engineering | 7 | 70.00 |
| 6 | Future Start | IT/IS | 2 | 4.00 |
| 7 | Future Start | Production | 8 | 3.85 |
| 8 | Future Start | Sales | 1 | 3.23 |
| 9 | Leave of Absence | IT/IS | 3 | 6.00 |
| 10 | Leave of Absence | Production | 11 | 5.29 |
| 11 | Terminated for Cause | IT/IS | 4 | 8.00 |
| 12 | Terminated for Cause | Production | 8 | 3.85 |
| 13 | Terminated for Cause | Sales | 1 | 3.23 |
| 14 | Terminated for Cause | Software Engineering | 1 | 10.00 |
| 15 | Voluntarily Terminated | Admin Offices | 2 | 20.00 |
| 16 | Voluntarily Terminated | IT/IS | 6 | 12.00 |
| 17 | Voluntarily Terminated | Production | 75 | 36.06 |
| 18 | Voluntarily Terminated | Sales | 3 | 9.68 |
| 19 | Voluntarily Terminated | Software Engineering | 2 | 20.00 |
# Определим столбчатую диаграмму с подсчётом значений по категориям
g = sns.catplot(x="Employment Status",
y="percent_of_department",
hue="Employment Status",
col="department",
col_wrap=3,
data=df_empl_status_over_dptmnt,
kind="bar",
order=["Active","Leave of Absence", "Future Start", "Voluntarily Terminated", "Terminated for Cause"],
height=3.2,
aspect=1.5,
sharex=True,
sharey=True,
dodge=False,
palette="tab20"
) # Определим столбчатую диаграмму с подсчётом значений по категориям
# Определим заголовк графика
g.fig.suptitle("Распределение статусов занятости сотрудников по департаментам, %", fontsize=16, x=0.325, y=1.125)
# Определим размер подписи оси X
g.set_xlabels(fontsize=12)
# Определим размер подписи оси Y
g.set_ylabels("Percent of department staff", fontsize=12)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=60,fontsize=10)
# Добавляем общую легенду, определяем её размеры и рамещение (x,y,width,height), количество полей и название
g.add_legend(bbox_to_anchor=(0.025, 0.9, 0.6, 0.2), loc='upper center', ncol=6, title='Employment Status')
plt.show()
ВЫВОД
# Выберем департаменты, дложности, статусы занятости и количество работников.
# Подсчитаем количество работников в группах
sql_quiery = \
"""
WITH
status_position AS
(SELECT
"Employment Status",
department,
position,
COUNT("Employee Number") AS employee_count
FROM
hr_dataset
GROUP BY
position,
department,
"Employment Status"
ORDER BY
department,
"Employment Status"
)
SELECT
"Employment Status",
department,
position,
employee_count,
ROUND(
employee_count / (SUM(employee_count) OVER (PARTITION BY position))*100, 2)
AS percent_of_position
FROM
status_position
ORDER BY
department,
position,
"Employment Status"
;"""
df_empl_status_over_postition = pd.read_sql(sql_quiery, conn)
df_empl_status_over_postition
| Employment Status | department | position | employee_count | percent_of_position | |
|---|---|---|---|---|---|
| 0 | Active | Admin Offices | Accountant I | 3 | 100.00 |
| 1 | Active | Admin Offices | Administrative Assistant | 2 | 66.67 |
| 2 | Voluntarily Terminated | Admin Offices | Administrative Assistant | 1 | 33.33 |
| 3 | Active | Admin Offices | Shared Services Manager | 1 | 50.00 |
| 4 | Voluntarily Terminated | Admin Offices | Shared Services Manager | 1 | 50.00 |
| 5 | Active | Admin Offices | Sr. Accountant | 2 | 100.00 |
| 6 | Active | Executive Office | President & CEO | 1 | 100.00 |
| 7 | Active | IT/IS | BI Developer | 4 | 100.00 |
| 8 | Active | IT/IS | BI Director | 1 | 100.00 |
| 9 | Active | IT/IS | CIO | 1 | 100.00 |
| 10 | Active | IT/IS | Data Architect | 1 | 100.00 |
| 11 | Active | IT/IS | Database Administrator | 7 | 53.85 |
| 12 | Leave of Absence | IT/IS | Database Administrator | 1 | 7.69 |
| 13 | Terminated for Cause | IT/IS | Database Administrator | 3 | 23.08 |
| 14 | Voluntarily Terminated | IT/IS | Database Administrator | 2 | 15.38 |
| 15 | Active | IT/IS | IT Director | 1 | 100.00 |
| 16 | Active | IT/IS | IT Manager - DB | 1 | 50.00 |
| 17 | Voluntarily Terminated | IT/IS | IT Manager - DB | 1 | 50.00 |
| 18 | Active | IT/IS | IT Manager - Infra | 1 | 100.00 |
| 19 | Active | IT/IS | IT Manager - Support | 1 | 100.00 |
| 20 | Active | IT/IS | IT Support | 4 | 100.00 |
| 21 | Active | IT/IS | Network Engineer | 8 | 88.89 |
| 22 | Voluntarily Terminated | IT/IS | Network Engineer | 1 | 11.11 |
| 23 | Active | IT/IS | Senior BI Developer | 3 | 100.00 |
| 24 | Future Start | IT/IS | Sr. DBA | 1 | 25.00 |
| 25 | Terminated for Cause | IT/IS | Sr. DBA | 1 | 25.00 |
| 26 | Voluntarily Terminated | IT/IS | Sr. DBA | 2 | 50.00 |
| 27 | Active | IT/IS | Sr. Network Engineer | 2 | 40.00 |
| 28 | Future Start | IT/IS | Sr. Network Engineer | 1 | 20.00 |
| 29 | Leave of Absence | IT/IS | Sr. Network Engineer | 2 | 40.00 |
| 30 | Active | Production | Director of Operations | 1 | 100.00 |
| 31 | Active | Production | Production Manager | 9 | 64.29 |
| 32 | Terminated for Cause | Production | Production Manager | 1 | 7.14 |
| 33 | Voluntarily Terminated | Production | Production Manager | 4 | 28.57 |
| 34 | Active | Production | Production Technician I | 73 | 53.68 |
| 35 | Future Start | Production | Production Technician I | 4 | 2.94 |
| 36 | Leave of Absence | Production | Production Technician I | 7 | 5.15 |
| 37 | Terminated for Cause | Production | Production Technician I | 7 | 5.15 |
| 38 | Voluntarily Terminated | Production | Production Technician I | 45 | 33.09 |
| 39 | Active | Production | Production Technician II | 23 | 40.35 |
| 40 | Future Start | Production | Production Technician II | 4 | 7.02 |
| 41 | Leave of Absence | Production | Production Technician II | 4 | 7.02 |
| 42 | Voluntarily Terminated | Production | Production Technician II | 26 | 45.61 |
| 43 | Active | Sales | Area Sales Manager | 23 | 85.19 |
| 44 | Future Start | Sales | Area Sales Manager | 1 | 3.70 |
| 45 | Terminated for Cause | Sales | Area Sales Manager | 1 | 3.70 |
| 46 | Voluntarily Terminated | Sales | Area Sales Manager | 2 | 7.41 |
| 47 | Active | Sales | Director of Sales | 1 | 100.00 |
| 48 | Active | Sales | Sales Manager | 2 | 66.67 |
| 49 | Voluntarily Terminated | Sales | Sales Manager | 1 | 33.33 |
| 50 | Active | Software Engineering | Software Engineer | 6 | 66.67 |
| 51 | Terminated for Cause | Software Engineering | Software Engineer | 1 | 11.11 |
| 52 | Voluntarily Terminated | Software Engineering | Software Engineer | 2 | 22.22 |
| 53 | Active | Software Engineering | Software Engineering Manager | 1 | 100.00 |
# Определим столбчатую диаграмму с подсчётом значений по категориям
g = sns.catplot(x="Employment Status",
order=["Active","Leave of Absence", "Future Start", "Voluntarily Terminated", "Terminated for Cause"],
y="percent_of_position",
col="position",
hue="department",
legend=False, # Не будем выводить легенду внутри сетки у нас будет отдельная
legend_out=True, # Определяем вывод отдельной легенды
col_wrap=4,
sharex=False,
sharey=True,
data=df_empl_status_over_postition,
kind="bar",
height=3,
aspect=1.2,
dodge=False,
alpha=1.0
)
# Определим заголовк графика
g.fig.suptitle("Распределение статусов занятости сотрудников по должностям, чел.", fontsize=16, x=0.3, y=1.05)
# Изменим интервал между индивидуальными графиками
g.fig.subplots_adjust(hspace=1)
# Определим размер подписи оси X
g.set_xlabels(fontsize=9)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=9)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=60,fontsize=8)
# Добавляем общую легенду, определяем её размеры и рамещение (x,y,width,height), количество полей и название
g.add_legend(bbox_to_anchor=(0, 0.83, 0.6, 0.2), loc='upper center', ncol=6, title='Departments')
plt.show()
ВЫВОД
# Выберем возраст и статусы занятости.
sql_quiery = \
"""
SELECT
"Employment Status",
age
FROM
hr_dataset
ORDER BY
age,
"Employment Status"
;"""
df_age_over_status = pd.read_sql(sql_quiery, conn)
df_age_over_status
| Employment Status | age | |
|---|---|---|
| 0 | Active | 25 |
| 1 | Active | 25 |
| 2 | Future Start | 26 |
| 3 | Voluntarily Terminated | 26 |
| 4 | Active | 27 |
| 5 | Active | 27 |
| 6 | Active | 27 |
| 7 | Active | 27 |
| 8 | Voluntarily Terminated | 27 |
| 9 | Active | 28 |
| 10 | Active | 28 |
| 11 | Active | 28 |
| 12 | Active | 28 |
| 13 | Active | 28 |
| 14 | Active | 28 |
| 15 | Active | 28 |
| 16 | Active | 28 |
| 17 | Active | 28 |
| 18 | Voluntarily Terminated | 28 |
| 19 | Voluntarily Terminated | 28 |
| 20 | Voluntarily Terminated | 28 |
| 21 | Active | 29 |
| 22 | Active | 29 |
| 23 | Active | 29 |
| 24 | Active | 29 |
| 25 | Active | 29 |
| 26 | Active | 29 |
| 27 | Active | 29 |
| 28 | Active | 29 |
| 29 | Active | 29 |
| 30 | Active | 29 |
| 31 | Active | 29 |
| 32 | Active | 29 |
| 33 | Terminated for Cause | 29 |
| 34 | Voluntarily Terminated | 29 |
| 35 | Voluntarily Terminated | 29 |
| 36 | Active | 30 |
| 37 | Active | 30 |
| 38 | Active | 30 |
| 39 | Active | 30 |
| 40 | Active | 30 |
| 41 | Active | 30 |
| 42 | Active | 30 |
| 43 | Active | 30 |
| 44 | Active | 30 |
| 45 | Voluntarily Terminated | 30 |
| 46 | Voluntarily Terminated | 30 |
| 47 | Voluntarily Terminated | 30 |
| 48 | Voluntarily Terminated | 30 |
| 49 | Voluntarily Terminated | 30 |
| 50 | Voluntarily Terminated | 30 |
| 51 | Active | 31 |
| 52 | Active | 31 |
| 53 | Active | 31 |
| 54 | Active | 31 |
| 55 | Active | 31 |
| 56 | Active | 31 |
| 57 | Active | 31 |
| 58 | Active | 31 |
| 59 | Active | 31 |
| 60 | Active | 31 |
| 61 | Active | 31 |
| 62 | Active | 31 |
| 63 | Active | 31 |
| 64 | Active | 31 |
| 65 | Future Start | 31 |
| 66 | Future Start | 31 |
| 67 | Leave of Absence | 31 |
| 68 | Terminated for Cause | 31 |
| 69 | Terminated for Cause | 31 |
| 70 | Voluntarily Terminated | 31 |
| 71 | Voluntarily Terminated | 31 |
| 72 | Voluntarily Terminated | 31 |
| 73 | Voluntarily Terminated | 31 |
| 74 | Voluntarily Terminated | 31 |
| 75 | Voluntarily Terminated | 31 |
| 76 | Active | 32 |
| 77 | Active | 32 |
| 78 | Active | 32 |
| 79 | Active | 32 |
| 80 | Active | 32 |
| 81 | Active | 32 |
| 82 | Leave of Absence | 32 |
| 83 | Leave of Absence | 32 |
| 84 | Terminated for Cause | 32 |
| 85 | Terminated for Cause | 32 |
| 86 | Voluntarily Terminated | 32 |
| 87 | Voluntarily Terminated | 32 |
| 88 | Active | 33 |
| 89 | Active | 33 |
| 90 | Active | 33 |
| 91 | Active | 33 |
| 92 | Active | 33 |
| 93 | Active | 33 |
| 94 | Active | 33 |
| 95 | Active | 33 |
| 96 | Active | 33 |
| 97 | Future Start | 33 |
| 98 | Future Start | 33 |
| 99 | Terminated for Cause | 33 |
| 100 | Voluntarily Terminated | 33 |
| 101 | Voluntarily Terminated | 33 |
| 102 | Voluntarily Terminated | 33 |
| 103 | Voluntarily Terminated | 33 |
| 104 | Voluntarily Terminated | 33 |
| 105 | Active | 34 |
| 106 | Active | 34 |
| 107 | Active | 34 |
| 108 | Active | 34 |
| 109 | Active | 34 |
| 110 | Active | 34 |
| 111 | Active | 34 |
| 112 | Active | 34 |
| 113 | Active | 34 |
| 114 | Active | 34 |
| 115 | Future Start | 34 |
| 116 | Leave of Absence | 34 |
| 117 | Terminated for Cause | 34 |
| 118 | Voluntarily Terminated | 34 |
| 119 | Voluntarily Terminated | 34 |
| 120 | Voluntarily Terminated | 34 |
| 121 | Voluntarily Terminated | 34 |
| 122 | Voluntarily Terminated | 34 |
| 123 | Voluntarily Terminated | 34 |
| 124 | Active | 35 |
| 125 | Active | 35 |
| 126 | Active | 35 |
| 127 | Active | 35 |
| 128 | Active | 35 |
| 129 | Active | 35 |
| 130 | Active | 35 |
| 131 | Active | 35 |
| 132 | Active | 35 |
| 133 | Future Start | 35 |
| 134 | Voluntarily Terminated | 35 |
| 135 | Voluntarily Terminated | 35 |
| 136 | Voluntarily Terminated | 35 |
| 137 | Active | 36 |
| 138 | Active | 36 |
| 139 | Active | 36 |
| 140 | Active | 36 |
| 141 | Active | 36 |
| 142 | Active | 36 |
| 143 | Active | 36 |
| 144 | Active | 36 |
| 145 | Terminated for Cause | 36 |
| 146 | Voluntarily Terminated | 36 |
| 147 | Voluntarily Terminated | 36 |
| 148 | Voluntarily Terminated | 36 |
| 149 | Voluntarily Terminated | 36 |
| 150 | Active | 37 |
| 151 | Active | 37 |
| 152 | Active | 37 |
| 153 | Active | 37 |
| 154 | Future Start | 37 |
| 155 | Terminated for Cause | 37 |
| 156 | Voluntarily Terminated | 37 |
| 157 | Voluntarily Terminated | 37 |
| 158 | Active | 38 |
| 159 | Active | 38 |
| 160 | Active | 38 |
| 161 | Active | 38 |
| 162 | Active | 38 |
| 163 | Active | 38 |
| 164 | Active | 38 |
| 165 | Active | 38 |
| 166 | Active | 38 |
| 167 | Active | 38 |
| 168 | Terminated for Cause | 38 |
| 169 | Voluntarily Terminated | 38 |
| 170 | Active | 39 |
| 171 | Active | 39 |
| 172 | Active | 39 |
| 173 | Active | 39 |
| 174 | Active | 39 |
| 175 | Active | 39 |
| 176 | Active | 39 |
| 177 | Active | 39 |
| 178 | Active | 39 |
| 179 | Active | 39 |
| 180 | Active | 39 |
| 181 | Active | 39 |
| 182 | Leave of Absence | 39 |
| 183 | Leave of Absence | 39 |
| 184 | Voluntarily Terminated | 39 |
| 185 | Voluntarily Terminated | 39 |
| 186 | Voluntarily Terminated | 39 |
| 187 | Voluntarily Terminated | 39 |
| 188 | Active | 40 |
| 189 | Active | 40 |
| 190 | Active | 40 |
| 191 | Active | 40 |
| 192 | Active | 40 |
| 193 | Active | 40 |
| 194 | Voluntarily Terminated | 40 |
| 195 | Voluntarily Terminated | 40 |
| 196 | Active | 41 |
| 197 | Active | 41 |
| 198 | Active | 41 |
| 199 | Active | 41 |
| 200 | Active | 41 |
| 201 | Future Start | 41 |
| 202 | Leave of Absence | 41 |
| 203 | Leave of Absence | 41 |
| 204 | Voluntarily Terminated | 41 |
| 205 | Voluntarily Terminated | 41 |
| 206 | Voluntarily Terminated | 41 |
| 207 | Voluntarily Terminated | 41 |
| 208 | Active | 42 |
| 209 | Active | 42 |
| 210 | Active | 42 |
| 211 | Active | 42 |
| 212 | Active | 42 |
| 213 | Active | 42 |
| 214 | Leave of Absence | 42 |
| 215 | Voluntarily Terminated | 42 |
| 216 | Active | 43 |
| 217 | Active | 43 |
| 218 | Active | 43 |
| 219 | Leave of Absence | 43 |
| 220 | Leave of Absence | 43 |
| 221 | Voluntarily Terminated | 43 |
| 222 | Voluntarily Terminated | 43 |
| 223 | Voluntarily Terminated | 43 |
| 224 | Voluntarily Terminated | 43 |
| 225 | Active | 44 |
| 226 | Active | 44 |
| 227 | Active | 44 |
| 228 | Active | 44 |
| 229 | Active | 44 |
| 230 | Voluntarily Terminated | 44 |
| 231 | Voluntarily Terminated | 44 |
| 232 | Voluntarily Terminated | 44 |
| 233 | Voluntarily Terminated | 44 |
| 234 | Active | 45 |
| 235 | Active | 45 |
| 236 | Active | 45 |
| 237 | Active | 45 |
| 238 | Active | 45 |
| 239 | Terminated for Cause | 45 |
| 240 | Voluntarily Terminated | 45 |
| 241 | Voluntarily Terminated | 45 |
| 242 | Voluntarily Terminated | 45 |
| 243 | Voluntarily Terminated | 45 |
| 244 | Voluntarily Terminated | 45 |
| 245 | Voluntarily Terminated | 45 |
| 246 | Active | 46 |
| 247 | Terminated for Cause | 46 |
| 248 | Terminated for Cause | 46 |
| 249 | Active | 47 |
| 250 | Active | 47 |
| 251 | Active | 47 |
| 252 | Active | 47 |
| 253 | Voluntarily Terminated | 47 |
| 254 | Voluntarily Terminated | 47 |
| 255 | Active | 48 |
| 256 | Active | 48 |
| 257 | Active | 48 |
| 258 | Active | 48 |
| 259 | Active | 48 |
| 260 | Leave of Absence | 48 |
| 261 | Leave of Absence | 48 |
| 262 | Voluntarily Terminated | 48 |
| 263 | Voluntarily Terminated | 48 |
| 264 | Voluntarily Terminated | 48 |
| 265 | Active | 49 |
| 266 | Active | 49 |
| 267 | Active | 49 |
| 268 | Active | 49 |
| 269 | Active | 49 |
| 270 | Active | 49 |
| 271 | Voluntarily Terminated | 49 |
| 272 | Active | 50 |
| 273 | Future Start | 50 |
| 274 | Voluntarily Terminated | 50 |
| 275 | Active | 51 |
| 276 | Active | 51 |
| 277 | Active | 51 |
| 278 | Active | 51 |
| 279 | Voluntarily Terminated | 51 |
| 280 | Active | 52 |
| 281 | Active | 52 |
| 282 | Active | 52 |
| 283 | Future Start | 52 |
| 284 | Voluntarily Terminated | 52 |
| 285 | Active | 53 |
| 286 | Active | 53 |
| 287 | Terminated for Cause | 53 |
| 288 | Voluntarily Terminated | 53 |
| 289 | Voluntarily Terminated | 53 |
| 290 | Active | 54 |
| 291 | Active | 54 |
| 292 | Active | 54 |
| 293 | Voluntarily Terminated | 54 |
| 294 | Voluntarily Terminated | 54 |
| 295 | Active | 55 |
| 296 | Active | 56 |
| 297 | Voluntarily Terminated | 58 |
| 298 | Active | 59 |
| 299 | Voluntarily Terminated | 59 |
| 300 | Voluntarily Terminated | 62 |
| 301 | Active | 63 |
| 302 | Active | 63 |
| 303 | Voluntarily Terminated | 63 |
| 304 | Voluntarily Terminated | 65 |
| 305 | Voluntarily Terminated | 65 |
| 306 | Leave of Absence | 66 |
| 307 | Voluntarily Terminated | 66 |
| 308 | Active | 67 |
| 309 | Voluntarily Terminated | 67 |
g=sns.displot(data=df_age_over_status,
x="age",
hue="Employment Status",
hue_order=["Active","Leave of Absence", "Future Start", "Voluntarily Terminated", "Terminated for Cause"],
col="Employment Status",
col_wrap=1,
col_order=["Active","Leave of Absence", "Future Start", "Voluntarily Terminated", "Terminated for Cause"],
binwidth=1,
height=4,
aspect=2.5)
# Определяем шкалу X:
scale_step = 5 # Определяем шаг шкалы X - его можно менять!
# Определяем основной диапазон шкалы
scale_span = list(range((int(df_age['age'].min() / scale_step) * scale_step),
int(df_age['age'].max()),
scale_step))
# При неоходимости дополняем первый и послдений элементы к шкале (если они не вошли в шкалу ранее)
if df_age['age'].min() < scale_span[0]:
scale_span = ([df_age['age'].min()] + scale_span)
if df_age['age'].max() > scale_span[-1]:
scale_span = scale_span + [df_age['age'].max()]
plt.xticks(ticks=scale_span, fontsize=12) # установим шкалу X и размер её обозначений
plt.yticks(fontsize=12) # Установим размер обозначения для шкалы Y
g.set_xlabels("Employee Age", fontsize=14) # Размер подписей шкалы X
g.set_ylabels("Employee Count", fontsize=14) # Размер подписей шкалы Y
# Заголовок графика:
g.fig.suptitle("Возрастное распеределение сотрудников по статусам занятости", fontsize=20, y=1.0)
# Добавим вертикальные линии медианы и средней арифметической возраста
# Для этого определим функцию построения вертикальной линии и её подписи
def age_lines(x, **kwargs):
# Построение линии медианы на основе принимаемых значений x, цвет, толщина, стиль
plt.axvline(x.median(), color='darkslategrey', linewidth=2, linestyle=':')
# Построение линии средней на основе принимаемых значений x, цвет, толщина, стиль
plt.axvline(x.mean(), color='navy', linewidth=2, linestyle='-.')
# Создание подписи к линии медианы
plt.annotate(
text=f"медиана {x.median()}", # Аннотация линии медианы.
xy=(x.median()-0.3, 9), # Положение подписи в единицах шкал графика
horizontalalignment='right', # Выравнивание текста по горизонтали
verticalalignment='bottom', # Выравнивание текста по вертикали
rotation="vertical", # Поворот подписи
color='darkslategrey', # Цвет надписи
alpha=1,
fontsize=14
)
plt.annotate(
text=f"средняя {x.mean():,.1f}", # Аннотация линии медианы.
xy=(x.mean()+0.3, 9), # Положение подписи в единицах шкал графика
horizontalalignment='left', # Выравнивание текста по горизонтали
verticalalignment='bottom', # Выравнивание текста по вертикали
rotation="vertical", # Поворот подписи
color='navy', # Цвет надписи
alpha=1,
fontsize=14
)
# Определяем "разметку" для исполнения функции построения линий
# Передаём функции величины возрастов
g.map(age_lines, 'age')
plt.show()
ВЫВОД
В целом, можно сказать, что основная тенденция как набора персонала, так и увольнения - более молодой возраст, относительного основного состава работников компании.
# Выберем половой признак и статусы занятости
sql_quiery = \
"""
WITH
sex_status AS
(SELECT
sex,
"Employment Status" AS empl_status,
COUNT("Employee Number") AS employee_count
FROM
hr_dataset
GROUP BY
"Employment Status",
sex
ORDER BY
sex,
"Employment Status"
)
SELECT
sex,
empl_status,
employee_count,
ROUND(
employee_count / (SUM(employee_count) OVER (PARTITION BY empl_status))*100, 2)
AS percent_of_status
FROM
sex_status
GROUP BY
empl_status,
sex,
employee_count
ORDER BY
sex,
empl_status
;"""
df_sex_over_status = pd.read_sql(sql_quiery, conn)
df_sex_over_status
| sex | empl_status | employee_count | percent_of_status | |
|---|---|---|---|---|
| 0 | Female | Active | 101 | 55.19 |
| 1 | Female | Future Start | 8 | 72.73 |
| 2 | Female | Leave of Absence | 9 | 64.29 |
| 3 | Female | Terminated for Cause | 8 | 57.14 |
| 4 | Female | Voluntarily Terminated | 51 | 57.95 |
| 5 | Male | Active | 82 | 44.81 |
| 6 | Male | Future Start | 3 | 27.27 |
| 7 | Male | Leave of Absence | 5 | 35.71 |
| 8 | Male | Terminated for Cause | 6 | 42.86 |
| 9 | Male | Voluntarily Terminated | 37 | 42.05 |
g=sns.catplot(data=df_sex_over_status,
x="empl_status",
order=["Active","Leave of Absence", "Future Start", "Voluntarily Terminated", "Terminated for Cause"],
y="percent_of_status",
hue="sex",
height=5,
aspect=2,
kind="bar",
palette="Pastel1")
plt.xticks(fontsize=12) # установим шкалу X и размер её обозначений
plt.yticks(fontsize=12) # Установим размер обозначения для шкалы Y
g.set_xlabels("Employement status", fontsize=14) # Размер подписей шкалы X
g.set_ylabels("% of Employees per employment status", fontsize=14) # Размер подписей шкалы Y
# Заголовок графика:
g.fig.suptitle(f"Распределение сотрудников по половому признаку в разбивке по статусам занятости, \n"
f"% в группе статуса занятости",
fontsize=20, y=1.125)
plt.show()
ВЫВОД
Какая-либо значимая взаимосвязь распределения сотрудников по половому признаку со статусами занятости отсутствует. Женщины везде превышают по численности мужчин. Единственным заметным различием является относительно большая доля женщин среди сотрудников, находящихся в отпуске и среди сотрудников, которые должны выйти на работу в будущем.
# Выберем расово-этнический признак и статусы занятости
sql_quiery = \
"""
WITH
race_status AS
(SELECT
racedesc,
"Employment Status" AS empl_status,
COUNT("Employee Number") AS employee_count
FROM
hr_dataset
GROUP BY
"Employment Status",
racedesc
ORDER BY
racedesc,
"Employment Status"
)
SELECT
racedesc AS "Race Description",
empl_status,
employee_count,
ROUND(
employee_count / (SUM(employee_count) OVER (PARTITION BY empl_status))*100, 2)
AS percent_of_status
FROM
race_status
GROUP BY
empl_status,
racedesc,
employee_count
ORDER BY
racedesc,
empl_status
;"""
df_race_over_status = pd.read_sql(sql_quiery, conn)
df_race_over_status
| Race Description | empl_status | employee_count | percent_of_status | |
|---|---|---|---|---|
| 0 | American Indian or Alaska Native | Active | 3 | 1.64 |
| 1 | American Indian or Alaska Native | Leave of Absence | 1 | 7.14 |
| 2 | Asian | Active | 18 | 9.84 |
| 3 | Asian | Future Start | 1 | 9.09 |
| 4 | Asian | Leave of Absence | 4 | 28.57 |
| 5 | Asian | Voluntarily Terminated | 11 | 12.50 |
| 6 | Black or African American | Active | 36 | 19.67 |
| 7 | Black or African American | Future Start | 2 | 18.18 |
| 8 | Black or African American | Leave of Absence | 2 | 14.29 |
| 9 | Black or African American | Terminated for Cause | 5 | 35.71 |
| 10 | Black or African American | Voluntarily Terminated | 12 | 13.64 |
| 11 | Hispanic | Active | 2 | 1.09 |
| 12 | Hispanic | Leave of Absence | 1 | 7.14 |
| 13 | Hispanic | Voluntarily Terminated | 1 | 1.14 |
| 14 | Two or more races | Active | 9 | 4.92 |
| 15 | Two or more races | Future Start | 1 | 9.09 |
| 16 | Two or more races | Leave of Absence | 1 | 7.14 |
| 17 | Two or more races | Terminated for Cause | 1 | 7.14 |
| 18 | Two or more races | Voluntarily Terminated | 6 | 6.82 |
| 19 | White | Active | 115 | 62.84 |
| 20 | White | Future Start | 7 | 63.64 |
| 21 | White | Leave of Absence | 5 | 35.71 |
| 22 | White | Terminated for Cause | 8 | 57.14 |
| 23 | White | Voluntarily Terminated | 58 | 65.91 |
g=sns.catplot(data=df_race_over_status,
x="empl_status",
order=["Active","Leave of Absence", "Future Start", "Voluntarily Terminated", "Terminated for Cause"],
y="percent_of_status",
hue="Race Description",
height=5,
aspect=2,
kind="bar",
palette="Dark2",
alpha=1)
plt.xticks(fontsize=12) # установим шкалу X и размер её обозначений
plt.yticks(fontsize=12) # Установим размер обозначения для шкалы Y
g.set_xlabels("Employement status", fontsize=14) # Размер подписей шкалы X
g.set_ylabels("% of Employees per employment status", fontsize=14) # Размер подписей шкалы Y
# Заголовок графика:
g.fig.suptitle(f"Распределение сотрудников по расово-этническому признаку в разбивке по статусам занятости, \n"
f"% в группе статуса занятости",
fontsize=20, y=1.125)
#g.ax.set_yscale('log')
plt.show()
ВЫВОД
Особо значимых тенденций в распределении расово-этнических признаков по статусам занятости не проявляется. Есть некоторые особенности.
Есть ли различия между общим распределением по семейному положению и внутригрупповым по статусам занятости?
# Выберем расово-этнический признак и статусы занятости
sql_quiery = \
"""
WITH
marital_status AS
(SELECT
maritaldesc,
"Employment Status" AS empl_status,
COUNT("Employee Number") AS employee_count
FROM
hr_dataset
GROUP BY
"Employment Status",
maritaldesc
ORDER BY
maritaldesc,
"Employment Status"
)
SELECT
maritaldesc,
empl_status,
employee_count,
ROUND(
employee_count / (SUM(employee_count) OVER (PARTITION BY empl_status))*100, 2)
AS percent_of_status
FROM
marital_status
GROUP BY
empl_status,
maritaldesc,
employee_count
ORDER BY
maritaldesc,
empl_status
;"""
df_marital_over_status = pd.read_sql(sql_quiery, conn)
df_marital_over_status
| maritaldesc | empl_status | employee_count | percent_of_status | |
|---|---|---|---|---|
| 0 | Divorced | Active | 14 | 7.65 |
| 1 | Divorced | Voluntarily Terminated | 16 | 18.18 |
| 2 | Married | Active | 65 | 35.52 |
| 3 | Married | Future Start | 5 | 45.45 |
| 4 | Married | Leave of Absence | 8 | 57.14 |
| 5 | Married | Terminated for Cause | 5 | 35.71 |
| 6 | Married | Voluntarily Terminated | 40 | 45.45 |
| 7 | Separated | Active | 8 | 4.37 |
| 8 | Separated | Future Start | 1 | 9.09 |
| 9 | Separated | Leave of Absence | 2 | 14.29 |
| 10 | Separated | Voluntarily Terminated | 1 | 1.14 |
| 11 | Single | Active | 92 | 50.27 |
| 12 | Single | Future Start | 5 | 45.45 |
| 13 | Single | Leave of Absence | 4 | 28.57 |
| 14 | Single | Terminated for Cause | 8 | 57.14 |
| 15 | Single | Voluntarily Terminated | 28 | 31.82 |
| 16 | Widowed | Active | 4 | 2.19 |
| 17 | Widowed | Terminated for Cause | 1 | 7.14 |
| 18 | Widowed | Voluntarily Terminated | 3 | 3.41 |
g=sns.catplot(data=df_marital_over_status,
x="empl_status",
order=["Active","Leave of Absence", "Future Start", "Voluntarily Terminated", "Terminated for Cause"],
y="percent_of_status",
hue="maritaldesc",
height=5,
aspect=2,
kind="bar",
palette="Paired",
alpha=1)
plt.xticks(fontsize=12) # установим шкалу X и размер её обозначений
plt.yticks(fontsize=12) # Установим размер обозначения для шкалы Y
g.set_xlabels("Employement status", fontsize=14) # Размер подписей шкалы X
g.set_ylabels("% of Employees per employment status", fontsize=14) # Размер подписей шкалы Y
# Заголовок графика:
g.fig.suptitle(f"Распределение сотрудников по признаку семейного положения в разбивке по статусам занятости, \n"
f"% в группе статуса занятости",
fontsize=20, y=1.125)
#g.ax.set_yscale('log')
plt.show()
ВЫВОД
Как отмечалось выше, основной контингент сотрудников составляют холостые (незамужние) сотрудники. Но при распределении по статусам занятости выявляется следующее.
# Выберем расово-этнический признак и статусы занятости
sql_quiery = \
"""
WITH
state_status AS
(SELECT
state,
"Employment Status" AS empl_status,
COUNT("Employee Number") AS employee_count
FROM
hr_dataset
GROUP BY
"Employment Status",
state
ORDER BY
state,
"Employment Status"
)
SELECT
state,
empl_status,
employee_count,
ROUND(
employee_count / (SUM(employee_count) OVER (PARTITION BY empl_status))*100, 2)
AS percent_of_status
FROM
state_status
GROUP BY
empl_status,
state,
employee_count
ORDER BY
state,
empl_status
;"""
df_state_over_status = pd.read_sql(sql_quiery, conn)
df_state_over_status
| state | empl_status | employee_count | percent_of_status | |
|---|---|---|---|---|
| 0 | AL | Active | 1 | 0.55 |
| 1 | AZ | Active | 1 | 0.55 |
| 2 | CA | Active | 1 | 0.55 |
| 3 | CO | Active | 1 | 0.55 |
| 4 | CT | Active | 4 | 2.19 |
| 5 | CT | Leave of Absence | 1 | 7.14 |
| 6 | CT | Terminated for Cause | 1 | 7.14 |
| 7 | FL | Active | 1 | 0.55 |
| 8 | GA | Active | 1 | 0.55 |
| 9 | ID | Active | 1 | 0.55 |
| 10 | IN | Active | 1 | 0.55 |
| 11 | KY | Active | 1 | 0.55 |
| 12 | MA | Active | 155 | 84.70 |
| 13 | MA | Future Start | 10 | 90.91 |
| 14 | MA | Leave of Absence | 13 | 92.86 |
| 15 | MA | Terminated for Cause | 12 | 85.71 |
| 16 | MA | Voluntarily Terminated | 85 | 96.59 |
| 17 | ME | Active | 1 | 0.55 |
| 18 | MT | Active | 1 | 0.55 |
| 19 | NC | Active | 1 | 0.55 |
| 20 | ND | Active | 1 | 0.55 |
| 21 | NH | Active | 1 | 0.55 |
| 22 | NV | Active | 1 | 0.55 |
| 23 | NY | Active | 1 | 0.55 |
| 24 | OH | Terminated for Cause | 1 | 7.14 |
| 25 | OR | Active | 1 | 0.55 |
| 26 | PA | Voluntarily Terminated | 1 | 1.14 |
| 27 | RI | Active | 1 | 0.55 |
| 28 | TN | Voluntarily Terminated | 1 | 1.14 |
| 29 | TX | Active | 2 | 1.09 |
| 30 | TX | Future Start | 1 | 9.09 |
| 31 | UT | Active | 1 | 0.55 |
| 32 | VA | Voluntarily Terminated | 1 | 1.14 |
| 33 | VT | Active | 2 | 1.09 |
| 34 | WA | Active | 1 | 0.55 |
ВЫВОД
Распределение штатов проижвания сотрудников в зависимости от статуса знаятости, мало чем отличается от общего распеределения. Особых зависимостей здесь нет. Это неудивительно, так как вне штата Массачусетс проживают единицы сотрудников.
# Выберем расово-этнический признак и статусы занятости
sql_quiery = \
"""
WITH
citizen_status AS
(SELECT
citizendesc,
"Employment Status" AS empl_status,
COUNT("Employee Number") AS employee_count
FROM
hr_dataset
GROUP BY
"Employment Status",
citizendesc
ORDER BY
citizendesc,
"Employment Status"
)
SELECT
citizendesc,
empl_status,
employee_count,
ROUND(
employee_count / (SUM(employee_count) OVER (PARTITION BY empl_status))*100, 2)
AS percent_of_status
FROM
citizen_status
GROUP BY
empl_status,
citizendesc,
employee_count
ORDER BY
citizendesc,
empl_status
;"""
df_citizen_over_status = pd.read_sql(sql_quiery, conn)
df_citizen_over_status
| citizendesc | empl_status | employee_count | percent_of_status | |
|---|---|---|---|---|
| 0 | Eligible NonCitizen | Active | 7 | 3.83 |
| 1 | Eligible NonCitizen | Voluntarily Terminated | 5 | 5.68 |
| 2 | Non-Citizen | Active | 1 | 0.55 |
| 3 | Non-Citizen | Voluntarily Terminated | 3 | 3.41 |
| 4 | US Citizen | Active | 175 | 95.63 |
| 5 | US Citizen | Future Start | 11 | 100.00 |
| 6 | US Citizen | Leave of Absence | 14 | 100.00 |
| 7 | US Citizen | Terminated for Cause | 14 | 100.00 |
| 8 | US Citizen | Voluntarily Terminated | 80 | 90.91 |
ВЫВОД
Точно также, как и со штатом проживания, нет никакой значимой тенденции распеределения признака гражденства по статусам занятости.
# Создадим таблицу для исследования зависимостей возраста от департамента и должности
sql_quiery = \
"""
SELECT
COALESCE(department, '[TOTAL]') as department,
COALESCE(position, '[SUBTOTAL department]') AS position,
COUNT("Employee Number"),
MIN(age) AS min_age,
PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY age) AS median_age,
AVG(age) AS avg_age,
MAX(age) AS max_age
FROM
hr_dataset
WHERE -- Условие, что работники действующие
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
GROUP BY ROLLUP(
department,
position
)
;"""
df_age_over_deptmnt_position = pd.read_sql(sql_quiery, conn, index_col=['department', 'position'])
df_age_over_deptmnt_position
| count | min_age | median_age | avg_age | max_age | ||
|---|---|---|---|---|---|---|
| department | position | |||||
| Admin Offices | Accountant I | 3 | 30 | 31.0 | 31.666667 | 34 |
| Administrative Assistant | 2 | 30 | 31.0 | 31.000000 | 32 | |
| Shared Services Manager | 1 | 33 | 33.0 | 33.000000 | 33 | |
| Sr. Accountant | 2 | 31 | 35.0 | 35.000000 | 39 | |
| [SUBTOTAL department] | 8 | 30 | 31.5 | 32.500000 | 39 | |
| Executive Office | President & CEO | 1 | 63 | 63.0 | 63.000000 | 63 |
| [SUBTOTAL department] | 1 | 63 | 63.0 | 63.000000 | 63 | |
| IT/IS | BI Developer | 4 | 28 | 32.0 | 32.500000 | 38 |
| BI Director | 1 | 46 | 46.0 | 46.000000 | 46 | |
| CIO | 1 | 38 | 38.0 | 38.000000 | 38 | |
| Data Architect | 1 | 45 | 45.0 | 45.000000 | 45 | |
| Database Administrator | 8 | 29 | 33.5 | 35.125000 | 48 | |
| IT Director | 1 | 37 | 37.0 | 37.000000 | 37 | |
| IT Manager - DB | 1 | 45 | 45.0 | 45.000000 | 45 | |
| IT Manager - Infra | 1 | 31 | 31.0 | 31.000000 | 31 | |
| IT Manager - Support | 1 | 47 | 47.0 | 47.000000 | 47 | |
| IT Support | 4 | 29 | 38.5 | 38.750000 | 49 | |
| Network Engineer | 8 | 28 | 33.5 | 34.875000 | 49 | |
| Senior BI Developer | 3 | 30 | 36.0 | 38.000000 | 48 | |
| Sr. DBA | 1 | 31 | 31.0 | 31.000000 | 31 | |
| Sr. Network Engineer | 5 | 32 | 40.0 | 45.000000 | 66 | |
| [SUBTOTAL department] | 40 | 28 | 36.0 | 37.600000 | 66 | |
| Production | Director of Operations | 1 | 35 | 35.0 | 35.000000 | 35 |
| Production Manager | 9 | 34 | 40.0 | 39.888889 | 49 | |
| Production Technician I | 84 | 25 | 39.0 | 39.428571 | 67 | |
| Production Technician II | 31 | 25 | 34.0 | 36.419355 | 54 | |
| [SUBTOTAL department] | 125 | 25 | 38.0 | 38.680000 | 67 | |
| Sales | Area Sales Manager | 24 | 27 | 34.5 | 38.416667 | 63 |
| Director of Sales | 1 | 52 | 52.0 | 52.000000 | 52 | |
| Sales Manager | 2 | 28 | 30.5 | 30.500000 | 33 | |
| [SUBTOTAL department] | 27 | 27 | 33.0 | 38.333333 | 63 | |
| Software Engineering | Software Engineer | 6 | 30 | 33.0 | 34.000000 | 39 |
| Software Engineering Manager | 1 | 51 | 51.0 | 51.000000 | 51 | |
| [SUBTOTAL department] | 7 | 30 | 35.0 | 36.428571 | 51 | |
| [TOTAL] | [SUBTOTAL department] | 208 | 25 | 37.0 | 38.230769 | 67 |
ВЫВОД
sql_quiery = \
"""
SELECT
COALESCE(department, '[TOTAL]') as department,
COALESCE(position, '[SUBTOTAL Department]') AS position,
COUNT("Employee Number") AS empl_count,
-- При вычислении преобразуем целочисленный числитель в число с плавающей точкой и 2 знаками после точки
ROUND(
SUM(CAST(CASE WHEN sex='Male' THEN 1 ELSE 0 END * 100 AS NUMERIC (9,2)))/ COUNT(sex), 2)
AS perc_male,
ROUND(
SUM(CAST(CASE WHEN sex='Female' THEN 1 else 0 END * 100 AS NUMERIC (9,2)))/ COUNT(sex), 2)
AS perc_female,
ROUND(
SUM(CAST(CASE WHEN maritaldesc='Married' THEN 1 ELSE 0 END * 100 AS NUMERIC (9,2)))/ COUNT(maritaldesc), 2)
AS perc_marry,
ROUND(
SUM(CAST(CASE WHEN maritaldesc='Divorced' THEN 1 ELSE 0 END * 100 AS NUMERIC (9,2)))/ COUNT(maritaldesc),2)
AS perc_divorce,
ROUND(
SUM(CAST(CASE WHEN maritaldesc='Single' THEN 1 ELSE 0 END * 100 AS NUMERIC (9,2)))/ COUNT(maritaldesc), 2)
AS perc_single,
ROUND(
SUM(CAST(CASE WHEN maritaldesc='Separatred' THEN 1 ELSE 0 END * 100 AS NUMERIC (9,2)))/ COUNT(maritaldesc), 2)
AS perc_separate,
ROUND(
SUM(CAST(CASE WHEN maritaldesc='Widowed' THEN 1 ELSE 0 END * 100 AS NUMERIC (9,2)))/ COUNT(maritaldesc), 2)
AS perc_widow
FROM
hr_dataset
WHERE -- Условие, что работники действующие
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
GROUP BY ROLLUP
(department,
position
)
ORDER BY
department,
position
;"""
df_sex_marital_over_dptmnt_pstn = pd.read_sql(sql_quiery, conn, index_col=['department',
'position'])
df_sex_marital_over_dptmnt_pstn
| empl_count | perc_male | perc_female | perc_marry | perc_divorce | perc_single | perc_separate | perc_widow | ||
|---|---|---|---|---|---|---|---|---|---|
| department | position | ||||||||
| Admin Offices | Accountant I | 3 | 66.67 | 33.33 | 33.33 | 33.33 | 33.33 | 0.0 | 0.00 |
| Administrative Assistant | 2 | 0.00 | 100.00 | 50.00 | 0.00 | 50.00 | 0.0 | 0.00 | |
| Shared Services Manager | 1 | 100.00 | 0.00 | 100.00 | 0.00 | 0.00 | 0.0 | 0.00 | |
| Sr. Accountant | 2 | 0.00 | 100.00 | 100.00 | 0.00 | 0.00 | 0.0 | 0.00 | |
| [SUBTOTAL Department] | 8 | 37.50 | 62.50 | 62.50 | 12.50 | 25.00 | 0.0 | 0.00 | |
| Executive Office | President & CEO | 1 | 0.00 | 100.00 | 100.00 | 0.00 | 0.00 | 0.0 | 0.00 |
| [SUBTOTAL Department] | 1 | 0.00 | 100.00 | 100.00 | 0.00 | 0.00 | 0.0 | 0.00 | |
| IT/IS | BI Developer | 4 | 75.00 | 25.00 | 75.00 | 0.00 | 25.00 | 0.0 | 0.00 |
| BI Director | 1 | 100.00 | 0.00 | 100.00 | 0.00 | 0.00 | 0.0 | 0.00 | |
| CIO | 1 | 0.00 | 100.00 | 0.00 | 0.00 | 100.00 | 0.0 | 0.00 | |
| Data Architect | 1 | 0.00 | 100.00 | 0.00 | 0.00 | 100.00 | 0.0 | 0.00 | |
| Database Administrator | 8 | 37.50 | 62.50 | 62.50 | 12.50 | 25.00 | 0.0 | 0.00 | |
| IT Director | 1 | 100.00 | 0.00 | 0.00 | 0.00 | 100.00 | 0.0 | 0.00 | |
| IT Manager - DB | 1 | 100.00 | 0.00 | 0.00 | 0.00 | 100.00 | 0.0 | 0.00 | |
| IT Manager - Infra | 1 | 100.00 | 0.00 | 100.00 | 0.00 | 0.00 | 0.0 | 0.00 | |
| IT Manager - Support | 1 | 100.00 | 0.00 | 0.00 | 0.00 | 100.00 | 0.0 | 0.00 | |
| IT Support | 4 | 25.00 | 75.00 | 25.00 | 0.00 | 75.00 | 0.0 | 0.00 | |
| Network Engineer | 8 | 50.00 | 50.00 | 62.50 | 12.50 | 12.50 | 0.0 | 0.00 | |
| Senior BI Developer | 3 | 66.67 | 33.33 | 0.00 | 0.00 | 100.00 | 0.0 | 0.00 | |
| Sr. DBA | 1 | 0.00 | 100.00 | 0.00 | 0.00 | 100.00 | 0.0 | 0.00 | |
| Sr. Network Engineer | 5 | 60.00 | 40.00 | 60.00 | 0.00 | 20.00 | 0.0 | 20.00 | |
| [SUBTOTAL Department] | 40 | 52.50 | 47.50 | 47.50 | 5.00 | 42.50 | 0.0 | 2.50 | |
| Production | Director of Operations | 1 | 0.00 | 100.00 | 0.00 | 0.00 | 100.00 | 0.0 | 0.00 |
| Production Manager | 9 | 55.56 | 44.44 | 33.33 | 33.33 | 33.33 | 0.0 | 0.00 | |
| Production Technician I | 84 | 38.10 | 61.90 | 36.90 | 7.14 | 47.62 | 0.0 | 3.57 | |
| Production Technician II | 31 | 38.71 | 61.29 | 29.03 | 3.23 | 54.84 | 0.0 | 0.00 | |
| [SUBTOTAL Department] | 125 | 39.20 | 60.80 | 34.40 | 8.00 | 48.80 | 0.0 | 2.40 | |
| Sales | Area Sales Manager | 24 | 58.33 | 41.67 | 33.33 | 0.00 | 58.33 | 0.0 | 0.00 |
| Director of Sales | 1 | 0.00 | 100.00 | 100.00 | 0.00 | 0.00 | 0.0 | 0.00 | |
| Sales Manager | 2 | 50.00 | 50.00 | 0.00 | 50.00 | 50.00 | 0.0 | 0.00 | |
| [SUBTOTAL Department] | 27 | 55.56 | 44.44 | 33.33 | 3.70 | 55.56 | 0.0 | 0.00 | |
| Software Engineering | Software Engineer | 6 | 16.67 | 83.33 | 16.67 | 0.00 | 83.33 | 0.0 | 0.00 |
| Software Engineering Manager | 1 | 100.00 | 0.00 | 0.00 | 0.00 | 100.00 | 0.0 | 0.00 | |
| [SUBTOTAL Department] | 7 | 28.57 | 71.43 | 14.29 | 0.00 | 85.71 | 0.0 | 0.00 | |
| [TOTAL] | [SUBTOTAL Department] | 208 | 43.27 | 56.73 | 37.50 | 6.73 | 48.56 | 0.0 | 1.92 |
ВЫВОД
sql_quiery = \
"""
SELECT
COALESCE(department, '[TOTAL]') as department,
COALESCE(position, '[SUBTOTAL Department]') AS position,
COUNT("Employee Number") AS empl_count,
-- При вычислении преобразуем целочисленный числитель в число с плавающей точкой и 2 знаками после точки
ROUND(
SUM(CAST(CASE WHEN racedesc='Black or African American' THEN 1 ELSE 0 END * 100 AS NUMERIC (9,2)))/
count(racedesc), 2) AS perc_Black_AfAm,
ROUND(
SUM(CAST(CASE WHEN racedesc='White' THEN 1 ELSE 0 END * 100 AS NUMERIC (9,2)))/
count(racedesc), 2) AS perc_White,
ROUND(
SUM(CAST(CASE WHEN racedesc='Asian' THEN 1 ELSE 0 END * 100 AS NUMERIC (9,2)))/
count(racedesc), 2) AS perc_Asian,
ROUND(
SUM(CAST(CASE WHEN racedesc='Two or more races' THEN 1 ELSE 0 END * 100 AS NUMERIC (9,2)))/
count(racedesc), 2) AS perc_2_or_more_race,
ROUND(
SUM(CAST(CASE WHEN racedesc='Hispanic' THEN 1 ELSE 0 END * 100 AS NUMERIC (9,2)))/
count(racedesc), 2) AS perc_Hispanic,
ROUND(
SUM(CAST(CASE WHEN racedesc='American Indian or Alaska Native' THEN 1 ELSE 0 END * 100 AS NUMERIC (9,2)))/
count(racedesc), 2) AS perc_Indian_or_Alaska
FROM
hr_dataset
WHERE -- Условие, что работники действующие
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
GROUP BY ROLLUP
(department,
position
)
ORDER BY
department,
position
;"""
df_race_over_dptmnt_pstn = pd.read_sql(sql_quiery, conn, index_col=['department',
'position'])
df_race_over_dptmnt_pstn
| empl_count | perc_black_afam | perc_white | perc_asian | perc_2_or_more_race | perc_hispanic | perc_indian_or_alaska | ||
|---|---|---|---|---|---|---|---|---|
| department | position | |||||||
| Admin Offices | Accountant I | 3 | 66.67 | 33.33 | 0.00 | 0.00 | 0.00 | 0.00 |
| Administrative Assistant | 2 | 0.00 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| Shared Services Manager | 1 | 0.00 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| Sr. Accountant | 2 | 0.00 | 50.00 | 50.00 | 0.00 | 0.00 | 0.00 | |
| [SUBTOTAL Department] | 8 | 25.00 | 62.50 | 12.50 | 0.00 | 0.00 | 0.00 | |
| Executive Office | President & CEO | 1 | 0.00 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| [SUBTOTAL Department] | 1 | 0.00 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| IT/IS | BI Developer | 4 | 50.00 | 50.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| BI Director | 1 | 0.00 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| CIO | 1 | 0.00 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| Data Architect | 1 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| Database Administrator | 8 | 12.50 | 62.50 | 25.00 | 0.00 | 0.00 | 0.00 | |
| IT Director | 1 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| IT Manager - DB | 1 | 0.00 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| IT Manager - Infra | 1 | 0.00 | 0.00 | 0.00 | 0.00 | 100.00 | 0.00 | |
| IT Manager - Support | 1 | 0.00 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| IT Support | 4 | 25.00 | 50.00 | 0.00 | 25.00 | 0.00 | 0.00 | |
| Network Engineer | 8 | 0.00 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| Senior BI Developer | 3 | 0.00 | 0.00 | 100.00 | 0.00 | 0.00 | 0.00 | |
| Sr. DBA | 1 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| Sr. Network Engineer | 5 | 0.00 | 60.00 | 40.00 | 0.00 | 0.00 | 0.00 | |
| [SUBTOTAL Department] | 40 | 17.50 | 60.00 | 17.50 | 2.50 | 2.50 | 0.00 | |
| Production | Director of Operations | 1 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| Production Manager | 9 | 11.11 | 77.78 | 0.00 | 0.00 | 11.11 | 0.00 | |
| Production Technician I | 84 | 16.67 | 63.10 | 14.29 | 4.76 | 0.00 | 1.19 | |
| Production Technician II | 31 | 22.58 | 61.29 | 3.23 | 3.23 | 3.23 | 6.45 | |
| [SUBTOTAL Department] | 125 | 18.40 | 63.20 | 10.40 | 4.00 | 1.60 | 2.40 | |
| Sales | Area Sales Manager | 24 | 25.00 | 45.83 | 4.17 | 20.83 | 0.00 | 4.17 |
| Director of Sales | 1 | 0.00 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| Sales Manager | 2 | 50.00 | 50.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| [SUBTOTAL Department] | 27 | 25.93 | 48.15 | 3.70 | 18.52 | 0.00 | 3.70 | |
| Software Engineering | Software Engineer | 6 | 16.67 | 66.67 | 16.67 | 0.00 | 0.00 | 0.00 |
| Software Engineering Manager | 1 | 0.00 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| [SUBTOTAL Department] | 7 | 14.29 | 71.43 | 14.29 | 0.00 | 0.00 | 0.00 | |
| [TOTAL] | [SUBTOTAL Department] | 208 | 19.23 | 61.06 | 11.06 | 5.29 | 1.44 | 1.92 |
ВЫВОД
Построенная таблица не показывает каких-то принципиальных зависимостей в отношении расово-этрничкского признака и места сотрудника в организационной структуре компании.
sql_quiery = \
"""
SELECT
COALESCE(department, '[TOTAL]') as department,
COALESCE(position, '[SUBTOTAL Department]') AS position,
COALESCE(state, '[SUBTOTAL state]') AS state,
COUNT("Employee Number") AS empl_count,
-- Ниже, результат от подсчёта процентов по группе необходимо умножить на 2, так как он включает подсчёт сотрудников
-- итоговую строку по группе.
-- Поэтому общая сумма (SUM(COUNT("Employee Number")) OVER (PARTITION BY position)) включает себя удвоенное
-- число сотрудников (в знаменателе) и уменьшает в два раза сумму процентов по строкам
ROUND(
COUNT("Employee Number") / (SUM(COUNT("Employee Number")) OVER (PARTITION BY position))*100*2, 2)
AS percent_of_state
FROM
hr_dataset
WHERE -- Условие, что работники действующие
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
GROUP BY ROLLUP
(department,
position,
state
)
ORDER BY
department,
position,
state
;"""
df_state_over_dptmnt_pstn = pd.read_sql(sql_quiery, conn, index_col=['department', 'position', 'state'])
df_state_over_dptmnt_pstn
| empl_count | percent_of_state | |||
|---|---|---|---|---|
| department | position | state | ||
| Admin Offices | Accountant I | MA | 3 | 100.00 |
| [SUBTOTAL state] | 3 | 100.00 | ||
| Administrative Assistant | MA | 2 | 100.00 | |
| [SUBTOTAL state] | 2 | 100.00 | ||
| Shared Services Manager | MA | 1 | 100.00 | |
| [SUBTOTAL state] | 1 | 100.00 | ||
| Sr. Accountant | MA | 2 | 100.00 | |
| [SUBTOTAL state] | 2 | 100.00 | ||
| [SUBTOTAL Department] | [SUBTOTAL state] | 8 | 3.85 | |
| Executive Office | President & CEO | MA | 1 | 100.00 |
| [SUBTOTAL state] | 1 | 100.00 | ||
| [SUBTOTAL Department] | [SUBTOTAL state] | 1 | 0.48 | |
| IT/IS | BI Developer | MA | 4 | 100.00 |
| [SUBTOTAL state] | 4 | 100.00 | ||
| BI Director | MA | 1 | 100.00 | |
| [SUBTOTAL state] | 1 | 100.00 | ||
| CIO | MA | 1 | 100.00 | |
| [SUBTOTAL state] | 1 | 100.00 | ||
| Data Architect | MA | 1 | 100.00 | |
| [SUBTOTAL state] | 1 | 100.00 | ||
| Database Administrator | MA | 7 | 87.50 | |
| TX | 1 | 12.50 | ||
| [SUBTOTAL state] | 8 | 100.00 | ||
| IT Director | MA | 1 | 100.00 | |
| [SUBTOTAL state] | 1 | 100.00 | ||
| IT Manager - DB | MA | 1 | 100.00 | |
| [SUBTOTAL state] | 1 | 100.00 | ||
| IT Manager - Infra | MA | 1 | 100.00 | |
| [SUBTOTAL state] | 1 | 100.00 | ||
| IT Manager - Support | MA | 1 | 100.00 | |
| [SUBTOTAL state] | 1 | 100.00 | ||
| IT Support | CT | 2 | 50.00 | |
| MA | 2 | 50.00 | ||
| [SUBTOTAL state] | 4 | 100.00 | ||
| Network Engineer | MA | 8 | 100.00 | |
| [SUBTOTAL state] | 8 | 100.00 | ||
| Senior BI Developer | MA | 3 | 100.00 | |
| [SUBTOTAL state] | 3 | 100.00 | ||
| Sr. DBA | MA | 1 | 100.00 | |
| [SUBTOTAL state] | 1 | 100.00 | ||
| Sr. Network Engineer | CT | 2 | 40.00 | |
| MA | 3 | 60.00 | ||
| [SUBTOTAL state] | 5 | 100.00 | ||
| [SUBTOTAL Department] | [SUBTOTAL state] | 40 | 19.23 | |
| Production | Director of Operations | MA | 1 | 100.00 |
| [SUBTOTAL state] | 1 | 100.00 | ||
| Production Manager | MA | 9 | 100.00 | |
| [SUBTOTAL state] | 9 | 100.00 | ||
| Production Technician I | MA | 84 | 100.00 | |
| [SUBTOTAL state] | 84 | 100.00 | ||
| Production Technician II | MA | 31 | 100.00 | |
| [SUBTOTAL state] | 31 | 100.00 | ||
| [SUBTOTAL Department] | [SUBTOTAL state] | 125 | 60.10 | |
| Sales | Area Sales Manager | AL | 1 | 4.17 |
| AZ | 1 | 4.17 | ||
| CA | 1 | 4.17 | ||
| CO | 1 | 4.17 | ||
| CT | 1 | 4.17 | ||
| FL | 1 | 4.17 | ||
| GA | 1 | 4.17 | ||
| ID | 1 | 4.17 | ||
| IN | 1 | 4.17 | ||
| KY | 1 | 4.17 | ||
| MA | 1 | 4.17 | ||
| ME | 1 | 4.17 | ||
| MT | 1 | 4.17 | ||
| NC | 1 | 4.17 | ||
| ND | 1 | 4.17 | ||
| NH | 1 | 4.17 | ||
| NV | 1 | 4.17 | ||
| NY | 1 | 4.17 | ||
| OR | 1 | 4.17 | ||
| TX | 2 | 8.33 | ||
| UT | 1 | 4.17 | ||
| VT | 1 | 4.17 | ||
| WA | 1 | 4.17 | ||
| [SUBTOTAL state] | 24 | 100.00 | ||
| Director of Sales | RI | 1 | 100.00 | |
| [SUBTOTAL state] | 1 | 100.00 | ||
| Sales Manager | MA | 1 | 50.00 | |
| VT | 1 | 50.00 | ||
| [SUBTOTAL state] | 2 | 100.00 | ||
| [SUBTOTAL Department] | [SUBTOTAL state] | 27 | 12.98 | |
| Software Engineering | Software Engineer | MA | 6 | 100.00 |
| [SUBTOTAL state] | 6 | 100.00 | ||
| Software Engineering Manager | MA | 1 | 100.00 | |
| [SUBTOTAL state] | 1 | 100.00 | ||
| [SUBTOTAL Department] | [SUBTOTAL state] | 7 | 3.37 | |
| [TOTAL] | [SUBTOTAL Department] | [SUBTOTAL state] | 208 | 100.00 |
ВЫВОД
Построенная таблица показала, почему среди сотрудников компании значительное число проживает в отдалённых от Массачусетса штатах (см. п. 1.1.3.6.). Если не брать в расчёт, соседние с Массачусетсом штаты, то в удалённых штатах работают прежде всего региональные менеджеры по продажам, а также несколько сотрудников IT/IS. Последние либо имеют возможность работать удалённо, либо обслуживают работу региональных торговых менеджеров.
sql_quiery = \
"""
SELECT
COALESCE(department, '[TOTAL]') as department,
COALESCE(position, '[SUBTOTAL Department]') AS position,
COUNT("Employee Number") AS empl_count,
ROUND(
CAST(SUM(CASE WHEN citizendesc='US Citizen' THEN 1 ELSE 0 END * 100) AS NUMERIC(9,2))/
COUNT(racedesc), 2) AS perc_US,
ROUND(
CAST(SUM(CASE WHEN citizendesc='Eligible NonCitizen' THEN 1 ELSE 0 END * 100) AS NUMERIC(9,2))/
COUNT(racedesc), 2) AS perc_Eligible_nonUS,
ROUND(
CAST(SUM(CASE WHEN citizendesc='Non-Citizen' THEN 1 ELSE 0 END * 100) AS NUMERIC(9,2))/
COUNT(racedesc), 2) AS perc_Non_US
FROM
hr_dataset
WHERE -- Условие, что работники действующие
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
GROUP BY ROLLUP
(department,
position
)
ORDER BY
department,
position
;"""
df_citizen_over_dptmnt_pstn = pd.read_sql(sql_quiery, conn, index_col=['department',
'position'])
df_citizen_over_dptmnt_pstn
| empl_count | perc_us | perc_eligible_nonus | perc_non_us | ||
|---|---|---|---|---|---|
| department | position | ||||
| Admin Offices | Accountant I | 3 | 100.00 | 0.00 | 0.00 |
| Administrative Assistant | 2 | 100.00 | 0.00 | 0.00 | |
| Shared Services Manager | 1 | 100.00 | 0.00 | 0.00 | |
| Sr. Accountant | 2 | 100.00 | 0.00 | 0.00 | |
| [SUBTOTAL Department] | 8 | 100.00 | 0.00 | 0.00 | |
| Executive Office | President & CEO | 1 | 100.00 | 0.00 | 0.00 |
| [SUBTOTAL Department] | 1 | 100.00 | 0.00 | 0.00 | |
| IT/IS | BI Developer | 4 | 100.00 | 0.00 | 0.00 |
| BI Director | 1 | 100.00 | 0.00 | 0.00 | |
| CIO | 1 | 100.00 | 0.00 | 0.00 | |
| Data Architect | 1 | 100.00 | 0.00 | 0.00 | |
| Database Administrator | 8 | 100.00 | 0.00 | 0.00 | |
| IT Director | 1 | 100.00 | 0.00 | 0.00 | |
| IT Manager - DB | 1 | 100.00 | 0.00 | 0.00 | |
| IT Manager - Infra | 1 | 0.00 | 100.00 | 0.00 | |
| IT Manager - Support | 1 | 100.00 | 0.00 | 0.00 | |
| IT Support | 4 | 100.00 | 0.00 | 0.00 | |
| Network Engineer | 8 | 87.50 | 12.50 | 0.00 | |
| Senior BI Developer | 3 | 100.00 | 0.00 | 0.00 | |
| Sr. DBA | 1 | 100.00 | 0.00 | 0.00 | |
| Sr. Network Engineer | 5 | 100.00 | 0.00 | 0.00 | |
| [SUBTOTAL Department] | 40 | 95.00 | 5.00 | 0.00 | |
| Production | Director of Operations | 1 | 100.00 | 0.00 | 0.00 |
| Production Manager | 9 | 100.00 | 0.00 | 0.00 | |
| Production Technician I | 84 | 96.43 | 2.38 | 1.19 | |
| Production Technician II | 31 | 93.55 | 6.45 | 0.00 | |
| [SUBTOTAL Department] | 125 | 96.00 | 3.20 | 0.80 | |
| Sales | Area Sales Manager | 24 | 95.83 | 4.17 | 0.00 |
| Director of Sales | 1 | 100.00 | 0.00 | 0.00 | |
| Sales Manager | 2 | 100.00 | 0.00 | 0.00 | |
| [SUBTOTAL Department] | 27 | 96.30 | 3.70 | 0.00 | |
| Software Engineering | Software Engineer | 6 | 100.00 | 0.00 | 0.00 |
| Software Engineering Manager | 1 | 100.00 | 0.00 | 0.00 | |
| [SUBTOTAL Department] | 7 | 100.00 | 0.00 | 0.00 | |
| [TOTAL] | [SUBTOTAL Department] | 208 | 96.15 | 3.37 | 0.48 |
ВЫВОД
Как показано выше, всего 8 сотрудников из 208 действующих не являются гражданами США. Они занимают, в основном, рядовые должности в компании. Их распределение по организационной структуре не выявляет никаких тенденций.
ПРИМЕЧАНИЕ
- Сравнение показателей заработной платы в целом по компании аналогично сравнению температур больных в целом по больнице. То есть оно является неправомерным и явно искажающим данные. Зарплаты можно сравнивать между сотрудниками, имеющими схожие уровни в иерархии и условия работы. Поэтому сравнение и анализ зависимостей заработных плат сотрудников будет проводиться с учётом подразделения и должности. В противном случае оно будет носить огульный характер.
- За время деятельности компании заработная плата сотрудников могла меняться. Поэтому для анализа зависимостей заработной платы применительно к действующей структуре компании необходимо взять только действующих сотрудников, так как данные по ним являются актуальными. Данные по уволенным сотрудникам являются историческими, они распределены во времени и могут сильно исказить результаты и выводы.
sql_quiery = \
"""
SELECT
COALESCE(department, '[TOTAL]') as department,
COALESCE(position, '[SUBTOTAL]') AS position,
COUNT("Employee Number"),
SUM("Pay Rate")::integer * 2080 AS total_pay_rate,
MIN("Pay Rate")::integer * 2080 AS min_pay_rate,
PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY "Pay Rate")::integer * 2080 AS median_pay_rate,
MAX("Pay Rate")::integer * 2080 AS max_pay_rate,
(MAX("Pay Rate") - MIN("Pay Rate"))::integer * 2080 AS delta_pay_rate,
AVG("Pay Rate")::integer * 2080 AS avg_pay_rate
FROM
hr_dataset
WHERE -- Условие, что работники действующие
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
GROUP BY ROLLUP(
department,
position
)
ORDER BY
department,
position
;"""
df_salary_dptmnt_pstn = pd.read_sql(sql_quiery, conn, index_col=['department',
'position']
)
df_salary_dptmnt_pstn
| count | total_pay_rate | min_pay_rate | median_pay_rate | max_pay_rate | delta_pay_rate | avg_pay_rate | ||
|---|---|---|---|---|---|---|---|---|
| department | position | |||||||
| Admin Offices | Accountant I | 3 | 166400 | 47840 | 58240 | 60320 | 12480 | 56160 |
| Administrative Assistant | 2 | 79040 | 35360 | 39520 | 45760 | 10400 | 39520 | |
| Shared Services Manager | 1 | 114400 | 114400 | 114400 | 114400 | 0 | 114400 | |
| Sr. Accountant | 2 | 145600 | 72800 | 72800 | 72800 | 0 | 72800 | |
| [SUBTOTAL] | 8 | 505440 | 35360 | 60320 | 114400 | 79040 | 62400 | |
| Executive Office | President & CEO | 1 | 166400 | 166400 | 166400 | 166400 | 0 | 166400 |
| [SUBTOTAL] | 1 | 166400 | 166400 | 166400 | 166400 | 0 | 166400 | |
| IT/IS | BI Developer | 4 | 376480 | 93600 | 93600 | 95680 | 2080 | 93600 |
| BI Director | 1 | 133120 | 133120 | 133120 | 133120 | 0 | 133120 | |
| CIO | 1 | 135200 | 135200 | 135200 | 135200 | 0 | 135200 | |
| Data Architect | 1 | 114400 | 114400 | 114400 | 114400 | 0 | 114400 | |
| Database Administrator | 8 | 615680 | 62400 | 79040 | 89440 | 27040 | 76960 | |
| IT Director | 1 | 135200 | 135200 | 135200 | 135200 | 0 | 135200 | |
| IT Manager - DB | 1 | 128960 | 128960 | 128960 | 128960 | 0 | 128960 | |
| IT Manager - Infra | 1 | 131040 | 131040 | 131040 | 131040 | 0 | 131040 | |
| IT Manager - Support | 1 | 133120 | 133120 | 133120 | 133120 | 0 | 133120 | |
| IT Support | 4 | 237120 | 54080 | 58240 | 64480 | 10400 | 58240 | |
| Network Engineer | 8 | 684320 | 56160 | 87360 | 101920 | 45760 | 85280 | |
| Senior BI Developer | 3 | 320320 | 104000 | 106080 | 108160 | 4160 | 106080 | |
| Sr. DBA | 1 | 126880 | 126880 | 126880 | 126880 | 0 | 126880 | |
| Sr. Network Engineer | 5 | 565760 | 110240 | 112320 | 116480 | 6240 | 112320 | |
| [SUBTOTAL] | 40 | 3835520 | 54080 | 93600 | 135200 | 81120 | 95680 | |
| Production | Director of Operations | 1 | 124800 | 124800 | 124800 | 124800 | 0 | 124800 |
| Production Manager | 9 | 1002560 | 106080 | 112320 | 114400 | 8320 | 112320 | |
| Production Technician I | 84 | 3355040 | 29120 | 41600 | 52000 | 22880 | 39520 | |
| Production Technician II | 31 | 1618240 | 45760 | 52000 | 60320 | 14560 | 52000 | |
| [SUBTOTAL] | 125 | 6102720 | 29120 | 45760 | 124800 | 95680 | 47840 | |
| Sales | Area Sales Manager | 24 | 2758080 | 112320 | 114400 | 118560 | 6240 | 114400 |
| Director of Sales | 1 | 124800 | 124800 | 124800 | 124800 | 0 | 124800 | |
| Sales Manager | 2 | 228800 | 112320 | 114400 | 116480 | 4160 | 114400 | |
| [SUBTOTAL] | 27 | 3111680 | 112320 | 114400 | 124800 | 12480 | 114400 | |
| Software Engineering | Software Engineer | 6 | 651040 | 99840 | 108160 | 118560 | 20800 | 108160 |
| Software Engineering Manager | 1 | 56160 | 56160 | 56160 | 56160 | 0 | 56160 | |
| [SUBTOTAL] | 7 | 707200 | 56160 | 101920 | 118560 | 62400 | 101920 | |
| [TOTAL] | [SUBTOTAL] | 208 | 14431040 | 29120 | 54080 | 166400 | 137280 | 68640 |
ВЫВОД
sql_quiery = \
"""
-- Создадим запрос и датафрейм со статусом занятости и зарплатными ставками по должностям,
-- на которых работают несколько сотрудников
-- Используем его для построения сетки графиков
-- Выберем должности, на которых работают или работали несколько сотрудников, независимо от того работают ли они сейчас
-- или уволены
WITH
PositionsSchedule AS
(SELECT
department,
position,
COUNT("Employee Number") AS empl_count
FROM hr_dataset
GROUP BY
department,
position
ORDER BY
department,
position
),
EmployeesBySchedule AS
(SELECT
department,
position,
"Employee Number",
"Employee Name",
"Employment Status",
ROUND("Pay Rate"::numeric * 2080, 2) AS "USD per Year"
FROM hr_dataset
ORDER BY
department,
position,
"USD per Year",
"Employee Number"
)
SELECT
*
FROM
PositionsSchedule
LEFT JOIN
EmployeesBySchedule
USING (department, position)
;
"""
dfg_salary_over_status = pd.read_sql(sql_quiery, conn)
#dfg_salary_over_status
# Создадим сетку графиков зависимости зарплат от оценки произаводительности
# сотрудников в разрезе должностей и департаментов
g=sns.catplot(
kind="strip",
x="position",
y="USD per Year",
hue="Employment Status",
data=dfg_salary_over_status,
col="department",
col_wrap=3,
height=4.5,
aspect = 0.8,
palette="tab10",
margin_titles=True,
sharex=False,
sharey=False,
marker='D',
s=10,
jitter=True,
alpha=0.9)
g.fig.suptitle(
"Зависимость годовой заработной платы от статуса занятости сотрудников для сопостовимых должностей",
fontsize=16, x=0.50, y=1.03)
# Определим размер подписи оси X
g.set_xlabels(fontsize=10)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=10)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=90,fontsize=9)
# Изменим интервал между индивидуальными графиками
g.fig.subplots_adjust(hspace=0.7)
plt.show()
ВЫВОД
ПРИМЕЧАНИЕ
Существуют случаи, когда у сотрудника со статусом "Active" указана дата увольнения и при этом не указана причина увольнения. В таких случаях дату увольнения будем игнорировать, и будем считать сотрудника действующим
sql_quiery = \
"""
WITH
schedule AS -- Выберем из hr_dataset только нужные поля и сгруппируем их по принципу действующий/уволенный
(SELECT
department,
position,
(CASE WHEN ("Employment Status" = 'Active')
OR ("Employment Status" = 'Leave of Absence')
OR ("Employment Status" = 'Future Start')
THEN 'Current'
ELSE 'Terminated'
END) AS status,
ROUND("Pay Rate"::numeric * 2080, 2) as "USD per Year",
"Date of Termination"
FROM hr_dataset
),
terminated_schedule AS -- выберем данные о зарплате и дате увольнения сотрудников, покинувших компанию
(SELECT
department,
position,
"USD per Year",
"Date of Termination"
FROM
schedule
WHERE
status = 'Terminated'
),
current_schedule AS -- выберем данные по действующим сотрудникам для вычисления текущей медианы зарплаты
(SELECT
department,
position,
PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY "USD per Year") AS "Current Median"
FROM
schedule
WHERE
status = 'Current' AND
position IN (SELECT
position
FROM
terminated_schedule)
GROUP BY
department,
position
)
SELECT -- присоединим к должностям уволенных сотрудиков данные о текущей медиане зарплаты по этим должностям
department,
position,
"Current Median",
"USD per Year",
"Date of Termination"
FROM
current_schedule
RIGHT JOIN
terminated_schedule
USING(department,
position)
;"""
dfg_term_salary_history_over_current_median = pd.read_sql(sql_quiery, conn)
dfg_term_salary_history_over_current_median
| department | position | Current Median | USD per Year | Date of Termination | |
|---|---|---|---|---|---|
| 0 | Admin Offices | Administrative Assistant | 39582.4 | 42640.0 | 2013-09-25 |
| 1 | Admin Offices | Shared Services Manager | 114400.0 | 114400.0 | 2015-08-15 |
| 2 | IT/IS | Database Administrator | 78052.0 | 89440.0 | 2015-09-12 |
| 3 | IT/IS | Database Administrator | 78052.0 | 100880.0 | 2015-03-15 |
| 4 | IT/IS | Database Administrator | 78052.0 | 83408.0 | 2015-02-22 |
| 5 | IT/IS | Database Administrator | 78052.0 | 85280.0 | 2016-05-01 |
| 6 | IT/IS | Database Administrator | 78052.0 | 93600.0 | 2015-10-31 |
| 7 | IT/IS | IT Manager - DB | 128960.0 | 43680.0 | 2015-11-04 |
| 8 | IT/IS | Network Engineer | 88400.0 | 58240.0 | 2015-05-12 |
| 9 | IT/IS | Sr. DBA | 127504.0 | 128960.0 | 2016-06-16 |
| 10 | IT/IS | Sr. DBA | 127504.0 | 121056.0 | 2016-02-19 |
| 11 | IT/IS | Sr. DBA | 127504.0 | 121680.0 | 2015-11-10 |
| 12 | Production | Production Manager | 112320.0 | 105040.0 | 2014-08-07 |
| 13 | Production | Production Manager | 112320.0 | 100880.0 | 2015-12-12 |
| 14 | Production | Production Manager | 112320.0 | 87360.0 | 2012-09-24 |
| 15 | Production | Production Manager | 112320.0 | 80080.0 | 2016-05-18 |
| 16 | Production | Production Manager | 112320.0 | 69680.0 | 2012-01-02 |
| 17 | Production | Production Technician I | 41340.0 | 33280.0 | 2011-09-06 |
| 18 | Production | Production Technician I | 41340.0 | 35360.0 | 2011-01-12 |
| 19 | Production | Production Technician I | 41340.0 | 45760.0 | 2012-09-19 |
| 20 | Production | Production Technician I | 41340.0 | 37440.0 | 2013-04-06 |
| 21 | Production | Production Technician I | 41340.0 | 41600.0 | 2013-06-15 |
| 22 | Production | Production Technician I | 41340.0 | 43680.0 | 2015-11-15 |
| 23 | Production | Production Technician I | 41340.0 | 31200.0 | 2012-09-23 |
| 24 | Production | Production Technician I | 41340.0 | 48880.0 | 2015-06-08 |
| 25 | Production | Production Technician I | 41340.0 | 35360.0 | 2013-06-06 |
| 26 | Production | Production Technician I | 41340.0 | 35360.0 | 2014-09-27 |
| 27 | Production | Production Technician I | 41340.0 | 37440.0 | 2014-02-25 |
| 28 | Production | Production Technician I | 41340.0 | 41600.0 | 2014-05-17 |
| 29 | Production | Production Technician I | 41340.0 | 45760.0 | 2011-11-15 |
| 30 | Production | Production Technician I | 41340.0 | 31200.0 | 2015-06-25 |
| 31 | Production | Production Technician I | 41340.0 | 29120.0 | 2014-01-11 |
| 32 | Production | Production Technician I | 41340.0 | 41080.0 | 2015-12-15 |
| 33 | Production | Production Technician I | 41340.0 | 35360.0 | 2016-04-29 |
| 34 | Production | Production Technician I | 41340.0 | 43680.0 | 2016-04-01 |
| 35 | Production | Production Technician I | 41340.0 | 45760.0 | 2015-06-04 |
| 36 | Production | Production Technician I | 41340.0 | 49920.0 | 2012-01-09 |
| 37 | Production | Production Technician I | 41340.0 | 39520.0 | 2013-08-19 |
| 38 | Production | Production Technician I | 41340.0 | 31200.0 | 2011-08-04 |
| 39 | Production | Production Technician I | 41340.0 | 31720.0 | 2011-09-26 |
| 40 | Production | Production Technician I | 41340.0 | 39520.0 | 2015-11-14 |
| 41 | Production | Production Technician I | 41340.0 | 31200.0 | 2012-09-26 |
| 42 | Production | Production Technician I | 41340.0 | 29120.0 | 2015-11-11 |
| 43 | Production | Production Technician I | 41340.0 | 37440.0 | 2014-04-04 |
| 44 | Production | Production Technician I | 41340.0 | 45760.0 | 2016-05-25 |
| 45 | Production | Production Technician I | 41340.0 | 38480.0 | 2016-05-01 |
| 46 | Production | Production Technician I | 41340.0 | 37440.0 | 2014-01-12 |
| 47 | Production | Production Technician I | 41340.0 | 49920.0 | 2012-12-28 |
| 48 | Production | Production Technician I | 41340.0 | 45760.0 | 2015-10-25 |
| 49 | Production | Production Technician I | 41340.0 | 37440.0 | 2012-11-30 |
| 50 | Production | Production Technician I | 41340.0 | 39520.0 | 2011-06-04 |
| 51 | Production | Production Technician I | 41340.0 | 33280.0 | 2013-06-18 |
| 52 | Production | Production Technician I | 41340.0 | 43680.0 | 2012-04-07 |
| 53 | Production | Production Technician I | 41340.0 | 41600.0 | 2016-01-15 |
| 54 | Production | Production Technician I | 41340.0 | 49920.0 | 2016-01-26 |
| 55 | Production | Production Technician I | 41340.0 | 33280.0 | 2016-05-17 |
| 56 | Production | Production Technician I | 41340.0 | 37440.0 | 2012-08-13 |
| 57 | Production | Production Technician I | 41340.0 | 43680.0 | 2010-07-30 |
| 58 | Production | Production Technician I | 41340.0 | 43680.0 | 2015-06-29 |
| 59 | Production | Production Technician I | 41340.0 | 31200.0 | 2013-04-01 |
| 60 | Production | Production Technician I | 41340.0 | 41600.0 | 2011-09-05 |
| 61 | Production | Production Technician I | 41340.0 | 41600.0 | 2011-05-14 |
| 62 | Production | Production Technician I | 41340.0 | 39520.0 | 2016-02-05 |
| 63 | Production | Production Technician I | 41340.0 | 43680.0 | 2016-02-08 |
| 64 | Production | Production Technician I | 41340.0 | 31200.0 | 2015-09-01 |
| 65 | Production | Production Technician I | 41340.0 | 47840.0 | 2011-05-15 |
| 66 | Production | Production Technician I | 41340.0 | 43680.0 | 2015-09-07 |
| 67 | Production | Production Technician I | 41340.0 | 33280.0 | 2015-06-27 |
| 68 | Production | Production Technician I | 41340.0 | 45760.0 | 2015-09-29 |
| 69 | Production | Production Technician II | 52000.0 | 60320.0 | 2012-09-24 |
| 70 | Production | Production Technician II | 52000.0 | 54080.0 | 2014-04-04 |
| 71 | Production | Production Technician II | 52000.0 | 52000.0 | 2013-01-07 |
| 72 | Production | Production Technician II | 52000.0 | 54080.0 | 2011-09-26 |
| 73 | Production | Production Technician II | 52000.0 | 60320.0 | 2015-11-04 |
| 74 | Production | Production Technician II | 52000.0 | 60320.0 | 2013-06-24 |
| 75 | Production | Production Technician II | 52000.0 | 49920.0 | 2012-01-09 |
| 76 | Production | Production Technician II | 52000.0 | 60320.0 | 2011-05-30 |
| 77 | Production | Production Technician II | 52000.0 | 49920.0 | 2013-02-18 |
| 78 | Production | Production Technician II | 52000.0 | 58240.0 | 2012-04-07 |
| 79 | Production | Production Technician II | 52000.0 | 47840.0 | 2013-04-01 |
| 80 | Production | Production Technician II | 52000.0 | 45760.0 | 2014-03-31 |
| 81 | Production | Production Technician II | 52000.0 | 58240.0 | 2013-04-15 |
| 82 | Production | Production Technician II | 52000.0 | 49920.0 | 2013-09-15 |
| 83 | Production | Production Technician II | 52000.0 | 47840.0 | 2011-08-19 |
| 84 | Production | Production Technician II | 52000.0 | 52000.0 | 2014-09-04 |
| 85 | Production | Production Technician II | 52000.0 | 52000.0 | 2013-08-19 |
| 86 | Production | Production Technician II | 52000.0 | 58240.0 | 2011-09-15 |
| 87 | Production | Production Technician II | 52000.0 | 60320.0 | 2012-02-04 |
| 88 | Production | Production Technician II | 52000.0 | 54080.0 | 2011-10-22 |
| 89 | Production | Production Technician II | 52000.0 | 45760.0 | 2012-02-08 |
| 90 | Production | Production Technician II | 52000.0 | 47840.0 | 2015-04-08 |
| 91 | Production | Production Technician II | 52000.0 | 59800.0 | 2012-07-08 |
| 92 | Production | Production Technician II | 52000.0 | 47840.0 | 2010-08-30 |
| 93 | Production | Production Technician II | 52000.0 | 45760.0 | 2012-07-02 |
| 94 | Production | Production Technician II | 52000.0 | 60320.0 | 2016-02-21 |
| 95 | Sales | Area Sales Manager | 114400.0 | 114400.0 | 2014-08-02 |
| 96 | Sales | Area Sales Manager | 114400.0 | 114400.0 | 2015-09-05 |
| 97 | Sales | Area Sales Manager | 114400.0 | 114400.0 | 2014-10-31 |
| 98 | Sales | Sales Manager | 114400.0 | 125320.0 | 2014-04-24 |
| 99 | Software Engineering | Software Engineer | 108950.4 | 100880.0 | 2013-06-05 |
| 100 | Software Engineering | Software Engineer | 108950.4 | 108680.0 | 2015-09-07 |
| 101 | Software Engineering | Software Engineer | 108950.4 | 94473.6 | 2014-04-15 |
# Создадим сетку графиков зависимости зарплат от периода работы в разрезе должностей и департаментов
g=sns.relplot(
x="Date of Termination",
y="USD per Year",
data=dfg_term_salary_history_over_current_median,
col="position",
hue="department",
col_wrap=3,
height=4,
aspect = 1,
palette="Set1",
kind='scatter',
facet_kws=dict(margin_titles=True, sharex=False, sharey=True, ylim=(0,150000)),
)
# Добавим горизонтальные линии медианы и средней арифметической заработной платы
# Для этого определим функцию построения горизонтальной линии и её подписи
def median_lines(y, **kwargs):
y=y.astype(float)
# Построение линии медианы на основе принимаемых значений x, цвет, толщина, стиль
plt.axhline(y.median(), color='navy', linewidth=1, linestyle='-.')
# Создание подписи к линии медианы
plt.annotate(
text=f"текущая медиана {y.median()}", # Аннотация линии медианы.
xy=(0,1), # Положение линии (чтобы не перекрывать точки графика)
xycoords="axes fraction",
horizontalalignment='left', # Выравнивание текста по горизонтали
verticalalignment='top', # Выравнивание текста по вертикали
rotation=None, # Поворот подписи
color='navy', # Цвет надписи
alpha=1,
fontsize=10
)
# Определяем "разметку" для исполнения функции построения линий медианы
# Передаём функции величины возрастов
g.map(median_lines, 'Current Median')
g.fig.suptitle("Сравнение зарплат уволенных (уволившихся) сотрудников с медианой зарплат действующх сотрудников"
, fontsize=16, x=0.45, y=1.03)
# Определим размер подписи оси X
g.set_xlabels(fontsize=10)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=90, fontsize=9)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=10)
# Изменим интервал между индивидуальными графиками
g.fig.subplots_adjust(hspace=0.5)
plt.show()
ВЫВОД
ПРИМЕЧАНИЕ
Здесь также уместно укзать, что такая зависимость специфична к занимаемой должности. Поэтому для её выявления будем рассматривать только должности с более чем единичным количеством сотрудников. Для этого используем временной представление, полученное выше.
sql_quiery = \
"""
-- Выберем должности, на которых работают несколько действующих сотрудников
-- Эта выборка нам понадобится еще несколько раз, поэтому создадим временное представление
CREATE OR REPLACE TEMP VIEW
PositionsWithMultActiveEmployees
AS
WITH
PositionsActiveSchedule AS
(SELECT
department,
position,
COUNT("Employee Number") AS empl_count
FROM hr_dataset
WHERE -- Условие, что работники действующие
("Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence')
GROUP BY
department,
position
ORDER BY
department,
position
),
EmployeesByActiveSchedule AS
(SELECT
department,
position,
"Employee Number",
"Employee Name",
"Employment Status"
FROM hr_dataset
WHERE -- Условие, что работники действующие
("Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence')
ORDER BY
department,
position,
"Employee Number"
)
SELECT
*
FROM
PositionsActiveSchedule
LEFT JOIN
EmployeesByActiveSchedule
USING (department, position)
WHERE
empl_count > 1
;
SELECT
*
FROM
PositionsWithMultActiveEmployees
;
"""
df_PositionsWithMultActiveEmployees = pd.read_sql(sql_quiery, conn)
df_PositionsWithMultActiveEmployees
| department | position | empl_count | Employee Number | Employee Name | Employment Status | |
|---|---|---|---|---|---|---|
| 0 | Admin Offices | Accountant I | 3 | 1103024456 | Brown, Mia | Active |
| 1 | Admin Offices | Accountant I | 3 | 1106026572 | LaRotonda, William | Active |
| 2 | Admin Offices | Accountant I | 3 | 1302053333 | Steans, Tyrone | Active |
| 3 | Admin Offices | Administrative Assistant | 2 | 1211050782 | Howard, Estelle | Active |
| 4 | Admin Offices | Administrative Assistant | 2 | 1307059817 | Singh, Nan | Active |
| 5 | Admin Offices | Sr. Accountant | 2 | 1201031308 | Foster-Baker, Amy | Active |
| 6 | Admin Offices | Sr. Accountant | 2 | 1307060188 | Boutwell, Bonalyn | Active |
| 7 | IT/IS | BI Developer | 4 | 1009919940 | Rachael, Maggie | Active |
| 8 | IT/IS | BI Developer | 4 | 1009919980 | Smith, Jason | Active |
| 9 | IT/IS | BI Developer | 4 | 1009919990 | Westinghouse, Matthew | Active |
| 10 | IT/IS | BI Developer | 4 | 1009920000 | Hubert, Robert | Active |
| 11 | IT/IS | Database Administrator | 8 | 808010278 | Simard, Kramer | Active |
| 12 | IT/IS | Database Administrator | 8 | 1003018246 | Johnson, Noelle | Leave of Absence |
| 13 | IT/IS | Database Administrator | 8 | 1105025718 | Horton, Jayne | Active |
| 14 | IT/IS | Database Administrator | 8 | 1108027853 | Petrowsky, Thelma | Active |
| 15 | IT/IS | Database Administrator | 8 | 1110029732 | Zhou, Julia | Active |
| 16 | IT/IS | Database Administrator | 8 | 1203032255 | Rogers, Ivan | Active |
| 17 | IT/IS | Database Administrator | 8 | 1406068403 | Murray, Thomas | Active |
| 18 | IT/IS | Database Administrator | 8 | 1407068885 | Roby, Lori | Active |
| 19 | IT/IS | IT Support | 4 | 602000312 | Lindsay, Leonara | Active |
| 20 | IT/IS | IT Support | 4 | 1203032263 | Soto, Julia | Active |
| 21 | IT/IS | IT Support | 4 | 1301052902 | Clayton, Rick | Active |
| 22 | IT/IS | IT Support | 4 | 1501072093 | Galia, Lisa | Active |
| 23 | IT/IS | Network Engineer | 8 | 906014183 | Shepard, Anita | Active |
| 24 | IT/IS | Network Engineer | 8 | 1001956578 | Morway, Tanya | Active |
| 25 | IT/IS | Network Engineer | 8 | 1012023013 | Merlos, Carlos | Active |
| 26 | IT/IS | Network Engineer | 8 | 1101023540 | Dolan, Linda | Active |
| 27 | IT/IS | Network Engineer | 8 | 1102024173 | Cisco, Anthony | Active |
| 28 | IT/IS | Network Engineer | 8 | 1212052023 | Bacong, Alejandro | Active |
| 29 | IT/IS | Network Engineer | 8 | 1411071506 | Turpin, Jumil | Active |
| 30 | IT/IS | Network Engineer | 8 | 1988299991 | Gonzalez, Maria | Active |
| 31 | IT/IS | Senior BI Developer | 3 | 1009919930 | Le, Binh | Active |
| 32 | IT/IS | Senior BI Developer | 3 | 1009919960 | Navathe, Kurt | Active |
| 33 | IT/IS | Senior BI Developer | 3 | 1009919970 | Wang, Charlie | Active |
| 34 | IT/IS | Sr. Network Engineer | 5 | 904013591 | Semizoglou, Jeremiah | Future Start |
| 35 | IT/IS | Sr. Network Engineer | 5 | 1108028108 | Lajiri, Jyoti | Leave of Absence |
| 36 | IT/IS | Sr. Network Engineer | 5 | 1301052347 | Warfield, Sarah | Active |
| 37 | IT/IS | Sr. Network Engineer | 5 | 1308060959 | South, Joe | Active |
| 38 | IT/IS | Sr. Network Engineer | 5 | 1411071312 | Daniele, Ann | Leave of Absence |
| 39 | Production | Production Manager | 9 | 1000974650 | Stanley, David | Active |
| 40 | Production | Production Manager | 9 | 1102024149 | Spirea, Kelley | Active |
| 41 | Production | Production Manager | 9 | 1103024679 | Liebig, Ketsia | Active |
| 42 | Production | Production Manager | 9 | 1107027351 | Miller, Brannon | Active |
| 43 | Production | Production Manager | 9 | 1110029990 | Butler, Webster L | Active |
| 44 | Production | Production Manager | 9 | 1307060077 | Gray, Elijiah | Active |
| 45 | Production | Production Manager | 9 | 1405067298 | Sullivan, Kissy | Active |
| 46 | Production | Production Manager | 9 | 1409070147 | Dunn, Amy | Active |
| 47 | Production | Production Manager | 9 | 1501072311 | Albert, Michael | Active |
| 48 | Production | Production Technician I | 84 | 706006285 | Dickinson, Geoff | Active |
| 49 | Production | Production Technician I | 84 | 710007555 | Rose, Ashley | Active |
| 50 | Production | Production Technician I | 84 | 803009012 | Ferreira, Violeta | Active |
| 51 | Production | Production Technician I | 84 | 807010161 | Sewkumar, Nori | Leave of Absence |
| 52 | Production | Production Technician I | 84 | 909015167 | Mckenna, Sandy | Active |
| 53 | Production | Production Technician I | 84 | 1001109612 | Darson, Jene'ya | Active |
| 54 | Production | Production Technician I | 84 | 1001735072 | Pitt, Brad | Active |
| 55 | Production | Production Technician I | 84 | 1002017900 | Heitzman, Anthony | Active |
| 56 | Production | Production Technician I | 84 | 1006020020 | Fidelia, Libby | Active |
| 57 | Production | Production Technician I | 84 | 1007020403 | Engdahl, Jean | Active |
| 58 | Production | Production Technician I | 84 | 1011022883 | Alagbe,Trina | Active |
| 59 | Production | Production Technician I | 84 | 1011022887 | Robinson, Elias | Active |
| 60 | Production | Production Technician I | 84 | 1012023152 | Trang, Mei | Active |
| 61 | Production | Production Technician I | 84 | 1012023295 | Cierpiszewski, Caroline | Active |
| 62 | Production | Production Technician I | 84 | 1101023353 | Lydon, Allison | Leave of Absence |
| 63 | Production | Production Technician I | 84 | 1101023612 | England, Rex | Active |
| 64 | Production | Production Technician I | 84 | 1101023679 | Barone, Francesco A | Active |
| 65 | Production | Production Technician I | 84 | 1102024121 | Motlagh, Dawn | Active |
| 66 | Production | Production Technician I | 84 | 1103024335 | Owad, Clinton | Active |
| 67 | Production | Production Technician I | 84 | 1104025414 | Garneau, Hamish | Active |
| 68 | Production | Production Technician I | 84 | 1105026041 | Gaul, Barbara | Active |
| 69 | Production | Production Technician I | 84 | 1106026474 | Von Massenbach, Anna | Future Start |
| 70 | Production | Production Technician I | 84 | 1106026579 | Gordon, David | Active |
| 71 | Production | Production Technician I | 84 | 1106026896 | Stoica, Rick | Active |
| 72 | Production | Production Technician I | 84 | 1109029256 | Medeiros, Jennifer | Active |
| 73 | Production | Production Technician I | 84 | 1109029366 | Bernstein, Sean | Active |
| 74 | Production | Production Technician I | 84 | 1110029777 | Becker, Scott | Leave of Absence |
| 75 | Production | Production Technician I | 84 | 1111030129 | Chang, Donovan E | Active |
| 76 | Production | Production Technician I | 84 | 1111030244 | Stanford,Barbara M | Active |
| 77 | Production | Production Technician I | 84 | 1201031310 | Sullivan, Timothy | Active |
| 78 | Production | Production Technician I | 84 | 1201031438 | Jackson, Maryellen | Active |
| 79 | Production | Production Technician I | 84 | 1202031618 | Dobrin, Denisa S | Active |
| 80 | Production | Production Technician I | 84 | 1203032357 | Nguyen, Lei-Ming | Active |
| 81 | Production | Production Technician I | 84 | 1204032927 | Girifalco, Evelyn | Active |
| 82 | Production | Production Technician I | 84 | 1205033102 | Shields, Seffi | Active |
| 83 | Production | Production Technician I | 84 | 1208048062 | Chace, Beatrice | Active |
| 84 | Production | Production Technician I | 84 | 1209048696 | DiNocco, Lily | Active |
| 85 | Production | Production Technician I | 84 | 1209049259 | Mahoney, Lauren | Active |
| 86 | Production | Production Technician I | 84 | 1211051232 | Zima, Colleen | Active |
| 87 | Production | Production Technician I | 84 | 1212051409 | Bachiochi, Linda | Leave of Absence |
| 88 | Production | Production Technician I | 84 | 1301052124 | Athwal, Sam | Active |
| 89 | Production | Production Technician I | 84 | 1301052462 | Keatts, Kramer | Active |
| 90 | Production | Production Technician I | 84 | 1302053044 | Newman, Richard | Leave of Absence |
| 91 | Production | Production Technician I | 84 | 1302053339 | Fernandes, Nilson | Active |
| 92 | Production | Production Technician I | 84 | 1302053362 | Sander, Kamrin | Active |
| 93 | Production | Production Technician I | 84 | 1304055683 | Knapp, Bradley J | Active |
| 94 | Production | Production Technician I | 84 | 1304055947 | Anderson, Linda | Active |
| 95 | Production | Production Technician I | 84 | 1304055987 | Langton, Enrico | Active |
| 96 | Production | Production Technician I | 84 | 1305057282 | Chan, Lin | Active |
| 97 | Production | Production Technician I | 84 | 1307059944 | Peterson, Kayla | Active |
| 98 | Production | Production Technician I | 84 | 1308060366 | Billis, Helen | Active |
| 99 | Production | Production Technician I | 84 | 1308060754 | Mangal, Debbie | Active |
| 100 | Production | Production Technician I | 84 | 1309061015 | Garcia, Raul | Active |
| 101 | Production | Production Technician I | 84 | 1311062610 | Kretschmer, John | Active |
| 102 | Production | Production Technician I | 84 | 1311063114 | Carey, Michael | Active |
| 103 | Production | Production Technician I | 84 | 1311063172 | Crimmings, Jean | Future Start |
| 104 | Production | Production Technician I | 84 | 1312063675 | Goyal, Roxana | Leave of Absence |
| 105 | Production | Production Technician I | 84 | 1401064327 | Maurice, Shana | Active |
| 106 | Production | Production Technician I | 84 | 1401064562 | Punjabhi, Louis | Active |
| 107 | Production | Production Technician I | 84 | 1403066020 | Ngodup, Shari | Active |
| 108 | Production | Production Technician I | 84 | 1403066069 | Cornett, Lisa | Active |
| 109 | Production | Production Technician I | 84 | 1403066194 | Beatrice, Courtney | Active |
| 110 | Production | Production Technician I | 84 | 1404066622 | Harrison, Kara | Active |
| 111 | Production | Production Technician I | 84 | 1404066949 | Osturnka, Adeel | Active |
| 112 | Production | Production Technician I | 84 | 1405067064 | Handschiegl, Joanne | Active |
| 113 | Production | Production Technician I | 84 | 1405067642 | Rivera, Haley | Active |
| 114 | Production | Production Technician I | 84 | 1406068241 | Harrell, Ludwick | Active |
| 115 | Production | Production Technician I | 84 | 1407069061 | Sutwell, Barbara | Active |
| 116 | Production | Production Technician I | 84 | 1407069280 | Clukey, Elijian | Future Start |
| 117 | Production | Production Technician I | 84 | 1408069539 | Gold, Shenice | Active |
| 118 | Production | Production Technician I | 84 | 1408069635 | Bugali, Josephine | Leave of Absence |
| 119 | Production | Production Technician I | 84 | 1408069882 | Ivey, Rose | Active |
| 120 | Production | Production Technician I | 84 | 1409070255 | Tippett, Jeanette | Active |
| 121 | Production | Production Technician I | 84 | 1409070522 | Adinolfi, Wilson K | Active |
| 122 | Production | Production Technician I | 84 | 1410070998 | Saar-Beckles, Melinda | Future Start |
| 123 | Production | Production Technician I | 84 | 1410071137 | Sparks, Taylor | Active |
| 124 | Production | Production Technician I | 84 | 1411071212 | Gonzalez, Cayo | Active |
| 125 | Production | Production Technician I | 84 | 1412071713 | Jhaveri, Sneha | Active |
| 126 | Production | Production Technician I | 84 | 1412071844 | Biden, Lowan M | Active |
| 127 | Production | Production Technician I | 84 | 1501071909 | Smith, Sade | Active |
| 128 | Production | Production Technician I | 84 | 1501072124 | Desimone, Carl | Active |
| 129 | Production | Production Technician I | 84 | 1501072192 | Gentry, Mildred | Active |
| 130 | Production | Production Technician I | 84 | 1503072857 | Jacobi, Hannah | Active |
| 131 | Production | Production Technician I | 84 | 1599991009 | Cockel, James | Active |
| 132 | Production | Production Technician II | 31 | 1001103149 | Monterro, Luisa | Active |
| 133 | Production | Production Technician II | 31 | 1001504432 | Lunquist, Lisa | Active |
| 134 | Production | Production Technician II | 31 | 1001549006 | Good, Susan | Leave of Absence |
| 135 | Production | Production Technician II | 31 | 1001970770 | Smith, Joe | Active |
| 136 | Production | Production Technician II | 31 | 1008020942 | Jeannite, Tayana | Active |
| 137 | Production | Production Technician II | 31 | 1011022818 | Walker, Roger | Active |
| 138 | Production | Production Technician II | 31 | 1011022820 | Burke, Joelle | Active |
| 139 | Production | Production Technician II | 31 | 1012023010 | Woodson, Jason | Active |
| 140 | Production | Production Technician II | 31 | 1101023457 | Buccheri, Joseph | Active |
| 141 | Production | Production Technician II | 31 | 1103024843 | Petingill, Shana | Active |
| 142 | Production | Production Technician II | 31 | 1103024924 | Hutter, Rosalie | Future Start |
| 143 | Production | Production Technician II | 31 | 1104025435 | Nowlan, Kristie | Active |
| 144 | Production | Production Technician II | 31 | 1105025661 | Erilus, Angela | Active |
| 145 | Production | Production Technician II | 31 | 1106026433 | Hunts, Julissa | Future Start |
| 146 | Production | Production Technician II | 31 | 1106026462 | Sahoo, Adil | Active |
| 147 | Production | Production Technician II | 31 | 1108028351 | Gosciminski, Phylicia | Leave of Absence |
| 148 | Production | Production Technician II | 31 | 1108028428 | Faller, Megan | Active |
| 149 | Production | Production Technician II | 31 | 1110029623 | Manchester, Robyn | Future Start |
| 150 | Production | Production Technician II | 31 | 1201031274 | Davis, Daniel | Active |
| 151 | Production | Production Technician II | 31 | 1205033180 | Wolk, Hang T | Active |
| 152 | Production | Production Technician II | 31 | 1301052436 | Moumanil, Maliki | Active |
| 153 | Production | Production Technician II | 31 | 1301052449 | Burkett, Benjamin | Active |
| 154 | Production | Production Technician II | 31 | 1303054329 | Beak, Kimberly | Future Start |
| 155 | Production | Production Technician II | 31 | 1306057810 | Johnston, Yen | Active |
| 156 | Production | Production Technician II | 31 | 1307059937 | Hankard, Earnest | Active |
| 157 | Production | Production Technician II | 31 | 1402065085 | Fancett, Nicole | Active |
| 158 | Production | Production Technician II | 31 | 1403066125 | Blount, Dianna | Active |
| 159 | Production | Production Technician II | 31 | 1404066711 | Monkfish, Erasumus | Active |
| 160 | Production | Production Technician II | 31 | 1405067565 | Linden, Mathew | Leave of Absence |
| 161 | Production | Production Technician II | 31 | 1406067957 | McCarthy, Brigit | Active |
| 162 | Production | Production Technician II | 31 | 1408069503 | Moran, Patrick | Leave of Absence |
| 163 | Sales | Area Sales Manager | 24 | 812011761 | Ozark, Travis | Active |
| 164 | Sales | Area Sales Manager | 24 | 1001084890 | Jeremy Prater | Active |
| 165 | Sales | Area Sales Manager | 24 | 1102024106 | Potts, Xana | Active |
| 166 | Sales | Area Sales Manager | 24 | 1104025008 | Khemmich, Bartholemew | Active |
| 167 | Sales | Area Sales Manager | 24 | 1111030503 | Villanueva, Noah | Active |
| 168 | Sales | Area Sales Manager | 24 | 1111030684 | Nguyen, Dheepa | Active |
| 169 | Sales | Area Sales Manager | 24 | 1203032099 | Givens, Myriam | Active |
| 170 | Sales | Area Sales Manager | 24 | 1204032843 | Friedman, Gerry | Active |
| 171 | Sales | Area Sales Manager | 24 | 1209048771 | Martins, Joseph | Active |
| 172 | Sales | Area Sales Manager | 24 | 1209049326 | McKinzie, Jac | Future Start |
| 173 | Sales | Area Sales Manager | 24 | 1306057978 | Mullaney, Howard | Active |
| 174 | Sales | Area Sales Manager | 24 | 1306059197 | Digitale, Alfred | Active |
| 175 | Sales | Area Sales Manager | 24 | 1312063714 | Valentin,Jackie | Active |
| 176 | Sales | Area Sales Manager | 24 | 1401064637 | Terry, Sharlene | Active |
| 177 | Sales | Area Sales Manager | 24 | 1403065721 | Carter, Michelle | Active |
| 178 | Sales | Area Sales Manager | 24 | 1408069481 | Dietrich, Jenna | Active |
| 179 | Sales | Area Sales Manager | 24 | 1409070567 | Costa, Latia | Active |
| 180 | Sales | Area Sales Manager | 24 | 1411071295 | Strong, Caitrin | Active |
| 181 | Sales | Area Sales Manager | 24 | 1411071302 | Fraval, Maruk | Active |
| 182 | Sales | Area Sales Manager | 24 | 1411071481 | Gonzales, Ricardo | Active |
| 183 | Sales | Area Sales Manager | 24 | 1412071660 | Leruth, Giovanni | Active |
| 184 | Sales | Area Sales Manager | 24 | 1501072180 | Onque, Jasmine | Active |
| 185 | Sales | Area Sales Manager | 24 | 1502072711 | Riordan, Michael | Active |
| 186 | Sales | Area Sales Manager | 24 | 1504073313 | Buck, Edward | Active |
| 187 | Sales | Sales Manager | 2 | 1402065303 | Daneault, Lynn | Active |
| 188 | Sales | Sales Manager | 2 | 1499902910 | Smith, John | Active |
| 189 | Software Engineering | Software Engineer | 6 | 1012023185 | Saada, Adell | Active |
| 190 | Software Engineering | Software Engineer | 6 | 1101023577 | Carabbio, Judith | Active |
| 191 | Software Engineering | Software Engineer | 6 | 1107027358 | Andreola, Colby | Active |
| 192 | Software Engineering | Software Engineer | 6 | 1201031324 | Szabo, Andrew | Active |
| 193 | Software Engineering | Software Engineer | 6 | 1203032498 | Del Bosque, Keyla | Active |
| 194 | Software Engineering | Software Engineer | 6 | 1303054625 | Martin, Sandra | Active |
sql_quiery = \
"""
-- Создадим запрос и датафрейм со сроком и зарплатными ставками по должностям, на которых работают несколько сотрудников
-- Используем его для построения сетки графиков
WITH
EmployeesPeriodAndPayrate AS
(SELECT
"Employee Number",
"Days Employed" / 360 AS "Years Employed",
ROUND("Pay Rate"::numeric * 2080, 2) AS "USD per Year"
FROM
hr_dataset
)
SELECT
department,
position,
"Years Employed",
"USD per Year"
FROM
PositionsWithMultActiveEmployees
LEFT JOIN
EmployeesPeriodAndPayrate
USING ("Employee Number")
ORDER BY
department,
position,
"Years Employed",
"USD per Year"
;
"""
dfg_salary_over_period_position = pd.read_sql(sql_quiery, conn)
#dfg_salary_over_period_position
# Создадим сетку графиков зависимости зарплат от периода работы в разрезе должностей и департаментов
g=sns.lmplot(
x="Years Employed",
y="USD per Year",
data=dfg_salary_over_period_position,
col="department",
hue="position",
col_wrap=3,
height=4,
aspect = 1,
palette="tab20",
facet_kws=dict(margin_titles=True, sharex=False, sharey=False),
)
g.fig.suptitle("Зависимость годовой заработной платы от срока работы в компании в годах", fontsize=16, x=0.45, y=1.03)
# Определим размер подписи оси X
g.set_xlabels(fontsize=10)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=10)
plt.show()
ВЫВОД
В целом по организации зависимости заработной платы от срока работы в компании нельзя назвать однозначной (для тех позиций, где такую зависимость возможно было бы определить), в ряде случаев она практически не прослеживается либо она очень незначительна, в ряде случаев она носит "парадоксальный" характер, но и там она заметна только как некая тенденция.
Интересно отметить, что в подразделении IT/IS должности сотрудников распределены по срокам работы сотрудников, так что сроки работы между должностями не пересекаются. Это может быть следствием поэтапного открытия новых должностей, на которых работники продолжают работать, либо с наличием "кампаний" по массовой замене сотрудников на какой-либо должности.
sql_quiery = \
"""
-- Создадим запрос и датафрейм с оценкой производительности и зарплатными ставками по должностям,
-- на которых работают несколько сотрудников
-- Используем его для построения сетки графиков
-- Для этого создадим временное представление с номерами сотрудников, оценкой производительности и ставками
CREATE OR REPLACE TEMP VIEW
EmployeesPerformancePayrate AS
SELECT
"Employee Number",
"Performance Score",
"Pay Rate"
FROM
hr_dataset
;
SELECT
department,
position,
"Performance Score",
ROUND("Pay Rate"::numeric * 2080, 2) AS "USD per Year"
FROM
PositionsWithMultActiveEmployees
LEFT JOIN
EmployeesPerformancePayrate
USING ("Employee Number")
ORDER BY
department,
position,
"USD per Year",
"Performance Score"
;
"""
dfg_salary_over_performance_position = pd.read_sql(sql_quiery, conn)
dfg_salary_over_performance_position
| department | position | Performance Score | USD per Year | |
|---|---|---|---|---|
| 0 | Admin Offices | Accountant I | Fully Meets | 47840.0 |
| 1 | Admin Offices | Accountant I | Fully Meets | 59280.0 |
| 2 | Admin Offices | Accountant I | Fully Meets | 60320.0 |
| 3 | Admin Offices | Administrative Assistant | N/A- too early to review | 34444.8 |
| 4 | Admin Offices | Administrative Assistant | N/A- too early to review | 44720.0 |
| 5 | Admin Offices | Sr. Accountant | 90-day meets | 72696.0 |
| 6 | Admin Offices | Sr. Accountant | Fully Meets | 72696.0 |
| 7 | IT/IS | BI Developer | Fully Meets | 93600.0 |
| 8 | IT/IS | BI Developer | Fully Meets | 93600.0 |
| 9 | IT/IS | BI Developer | Fully Meets | 93600.0 |
| 10 | IT/IS | BI Developer | Fully Meets | 95680.0 |
| 11 | IT/IS | Database Administrator | 90-day meets | 62816.0 |
| 12 | IT/IS | Database Administrator | 90-day meets | 65312.0 |
| 13 | IT/IS | Database Administrator | N/A- too early to review | 70720.0 |
| 14 | IT/IS | Database Administrator | Exceptional | 73840.0 |
| 15 | IT/IS | Database Administrator | Fully Meets | 82264.0 |
| 16 | IT/IS | Database Administrator | 90-day meets | 83200.0 |
| 17 | IT/IS | Database Administrator | N/A- too early to review | 87776.0 |
| 18 | IT/IS | Database Administrator | Exceptional | 88920.0 |
| 19 | IT/IS | IT Support | Exceeds | 54080.0 |
| 20 | IT/IS | IT Support | Fully Meets | 57179.2 |
| 21 | IT/IS | IT Support | Fully Meets | 60299.2 |
| 22 | IT/IS | IT Support | Fully Meets | 65312.0 |
| 23 | IT/IS | Network Engineer | Fully Meets | 56160.0 |
| 24 | IT/IS | Network Engineer | 90-day meets | 76960.0 |
| 25 | IT/IS | Network Engineer | Fully Meets | 81120.0 |
| 26 | IT/IS | Network Engineer | N/A- too early to review | 87360.0 |
| 27 | IT/IS | Network Engineer | N/A- too early to review | 89440.0 |
| 28 | IT/IS | Network Engineer | 90-day meets | 93600.0 |
| 29 | IT/IS | Network Engineer | Fully Meets | 97760.0 |
| 30 | IT/IS | Network Engineer | N/A- too early to review | 102128.0 |
| 31 | IT/IS | Senior BI Developer | Fully Meets | 104520.0 |
| 32 | IT/IS | Senior BI Developer | Fully Meets | 106080.0 |
| 33 | IT/IS | Senior BI Developer | Fully Meets | 108680.0 |
| 34 | IT/IS | Sr. Network Engineer | 90-day meets | 110240.0 |
| 35 | IT/IS | Sr. Network Engineer | N/A- too early to review | 111904.0 |
| 36 | IT/IS | Sr. Network Engineer | Fully Meets | 112528.0 |
| 37 | IT/IS | Sr. Network Engineer | N/A- too early to review | 114816.0 |
| 38 | IT/IS | Sr. Network Engineer | Fully Meets | 116896.0 |
| 39 | Production | Production Manager | Fully Meets | 106080.0 |
| 40 | Production | Production Manager | Fully Meets | 108160.0 |
| 41 | Production | Production Manager | Fully Meets | 110240.0 |
| 42 | Production | Production Manager | Needs Improvement | 110240.0 |
| 43 | Production | Production Manager | Fully Meets | 112320.0 |
| 44 | Production | Production Manager | Fully Meets | 113360.0 |
| 45 | Production | Production Manager | Exceeds | 114400.0 |
| 46 | Production | Production Manager | Exceeds | 114400.0 |
| 47 | Production | Production Manager | Fully Meets | 114400.0 |
| 48 | Production | Production Technician I | Fully Meets | 29120.0 |
| 49 | Production | Production Technician I | Fully Meets | 29120.0 |
| 50 | Production | Production Technician I | Fully Meets | 29120.0 |
| 51 | Production | Production Technician I | Exceeds | 31200.0 |
| 52 | Production | Production Technician I | Fully Meets | 31200.0 |
| 53 | Production | Production Technician I | Fully Meets | 31200.0 |
| 54 | Production | Production Technician I | Fully Meets | 31200.0 |
| 55 | Production | Production Technician I | Fully Meets | 31200.0 |
| 56 | Production | Production Technician I | Fully Meets | 31200.0 |
| 57 | Production | Production Technician I | Fully Meets | 31200.0 |
| 58 | Production | Production Technician I | Fully Meets | 31616.0 |
| 59 | Production | Production Technician I | 90-day meets | 32760.0 |
| 60 | Production | Production Technician I | Exceeds | 33280.0 |
| 61 | Production | Production Technician I | Exceeds | 33280.0 |
| 62 | Production | Production Technician I | Exceeds | 33280.0 |
| 63 | Production | Production Technician I | Exceeds | 33280.0 |
| 64 | Production | Production Technician I | Fully Meets | 33280.0 |
| 65 | Production | Production Technician I | Fully Meets | 33280.0 |
| 66 | Production | Production Technician I | Fully Meets | 33280.0 |
| 67 | Production | Production Technician I | Fully Meets | 33280.0 |
| 68 | Production | Production Technician I | PIP | 33280.0 |
| 69 | Production | Production Technician I | Fully Meets | 34840.0 |
| 70 | Production | Production Technician I | Fully Meets | 34860.8 |
| 71 | Production | Production Technician I | Exceptional | 35360.0 |
| 72 | Production | Production Technician I | Fully Meets | 35360.0 |
| 73 | Production | Production Technician I | Fully Meets | 35360.0 |
| 74 | Production | Production Technician I | Fully Meets | 35360.0 |
| 75 | Production | Production Technician I | Fully Meets | 35360.0 |
| 76 | Production | Production Technician I | Fully Meets | 35360.0 |
| 77 | Production | Production Technician I | Fully Meets | 35360.0 |
| 78 | Production | Production Technician I | Fully Meets | 35360.0 |
| 79 | Production | Production Technician I | Fully Meets | 37440.0 |
| 80 | Production | Production Technician I | N/A- too early to review | 37440.0 |
| 81 | Production | Production Technician I | 90-day meets | 39520.0 |
| 82 | Production | Production Technician I | 90-day meets | 39520.0 |
| 83 | Production | Production Technician I | Fully Meets | 39520.0 |
| 84 | Production | Production Technician I | Fully Meets | 39520.0 |
| 85 | Production | Production Technician I | Fully Meets | 39520.0 |
| 86 | Production | Production Technician I | N/A- too early to review | 39520.0 |
| 87 | Production | Production Technician I | N/A- too early to review | 39520.0 |
| 88 | Production | Production Technician I | Fully Meets | 40560.0 |
| 89 | Production | Production Technician I | N/A- too early to review | 41080.0 |
| 90 | Production | Production Technician I | 90-day meets | 41600.0 |
| 91 | Production | Production Technician I | 90-day meets | 41600.0 |
| 92 | Production | Production Technician I | Exceeds | 41600.0 |
| 93 | Production | Production Technician I | Exceeds | 41600.0 |
| 94 | Production | Production Technician I | Fully Meets | 41600.0 |
| 95 | Production | Production Technician I | Fully Meets | 41600.0 |
| 96 | Production | Production Technician I | Fully Meets | 41600.0 |
| 97 | Production | Production Technician I | N/A- too early to review | 41600.0 |
| 98 | Production | Production Technician I | N/A- too early to review | 41600.0 |
| 99 | Production | Production Technician I | N/A- too early to review | 41600.0 |
| 100 | Production | Production Technician I | PIP | 41600.0 |
| 101 | Production | Production Technician I | 90-day meets | 43680.0 |
| 102 | Production | Production Technician I | Fully Meets | 43680.0 |
| 103 | Production | Production Technician I | Fully Meets | 43680.0 |
| 104 | Production | Production Technician I | Fully Meets | 43680.0 |
| 105 | Production | Production Technician I | Fully Meets | 43680.0 |
| 106 | Production | Production Technician I | Fully Meets | 43680.0 |
| 107 | Production | Production Technician I | Fully Meets | 43680.0 |
| 108 | Production | Production Technician I | Fully Meets | 43680.0 |
| 109 | Production | Production Technician I | Fully Meets | 44200.0 |
| 110 | Production | Production Technician I | Exceeds | 45760.0 |
| 111 | Production | Production Technician I | Fully Meets | 45760.0 |
| 112 | Production | Production Technician I | Fully Meets | 45760.0 |
| 113 | Production | Production Technician I | Fully Meets | 45760.0 |
| 114 | Production | Production Technician I | Fully Meets | 45760.0 |
| 115 | Production | Production Technician I | Fully Meets | 45760.0 |
| 116 | Production | Production Technician I | Fully Meets | 45760.0 |
| 117 | Production | Production Technician I | N/A- too early to review | 45760.0 |
| 118 | Production | Production Technician I | N/A- too early to review | 45760.0 |
| 119 | Production | Production Technician I | Needs Improvement | 45760.0 |
| 120 | Production | Production Technician I | Needs Improvement | 45760.0 |
| 121 | Production | Production Technician I | Exceptional | 47840.0 |
| 122 | Production | Production Technician I | Fully Meets | 47840.0 |
| 123 | Production | Production Technician I | Fully Meets | 47840.0 |
| 124 | Production | Production Technician I | Fully Meets | 49920.0 |
| 125 | Production | Production Technician I | Fully Meets | 49920.0 |
| 126 | Production | Production Technician I | Fully Meets | 49920.0 |
| 127 | Production | Production Technician I | Fully Meets | 49920.0 |
| 128 | Production | Production Technician I | Fully Meets | 49920.0 |
| 129 | Production | Production Technician I | Fully Meets | 49920.0 |
| 130 | Production | Production Technician I | Fully Meets | 50960.0 |
| 131 | Production | Production Technician I | Exceeds | 51480.0 |
| 132 | Production | Production Technician II | Exceeds | 45760.0 |
| 133 | Production | Production Technician II | Fully Meets | 45760.0 |
| 134 | Production | Production Technician II | Fully Meets | 45760.0 |
| 135 | Production | Production Technician II | Fully Meets | 45760.0 |
| 136 | Production | Production Technician II | Fully Meets | 45760.0 |
| 137 | Production | Production Technician II | Exceeds | 46800.0 |
| 138 | Production | Production Technician II | N/A- too early to review | 47840.0 |
| 139 | Production | Production Technician II | Fully Meets | 49920.0 |
| 140 | Production | Production Technician II | Fully Meets | 49920.0 |
| 141 | Production | Production Technician II | PIP | 49920.0 |
| 142 | Production | Production Technician II | Fully Meets | 50440.0 |
| 143 | Production | Production Technician II | Fully Meets | 50440.0 |
| 144 | Production | Production Technician II | 90-day meets | 52000.0 |
| 145 | Production | Production Technician II | Exceptional | 52000.0 |
| 146 | Production | Production Technician II | Exceptional | 52000.0 |
| 147 | Production | Production Technician II | Fully Meets | 52000.0 |
| 148 | Production | Production Technician II | N/A- too early to review | 52000.0 |
| 149 | Production | Production Technician II | Fully Meets | 54080.0 |
| 150 | Production | Production Technician II | Fully Meets | 54080.0 |
| 151 | Production | Production Technician II | Fully Meets | 54080.0 |
| 152 | Production | Production Technician II | N/A- too early to review | 54080.0 |
| 153 | Production | Production Technician II | Exceeds | 54288.0 |
| 154 | Production | Production Technician II | Fully Meets | 54891.2 |
| 155 | Production | Production Technician II | Exceeds | 56160.0 |
| 156 | Production | Production Technician II | Fully Meets | 56160.0 |
| 157 | Production | Production Technician II | Fully Meets | 56160.0 |
| 158 | Production | Production Technician II | Fully Meets | 56160.0 |
| 159 | Production | Production Technician II | Needs Improvement | 56160.0 |
| 160 | Production | Production Technician II | N/A- too early to review | 58240.0 |
| 161 | Production | Production Technician II | Fully Meets | 60320.0 |
| 162 | Production | Production Technician II | Fully Meets | 60320.0 |
| 163 | Sales | Area Sales Manager | Fully Meets | 112320.0 |
| 164 | Sales | Area Sales Manager | 90-day meets | 114400.0 |
| 165 | Sales | Area Sales Manager | 90-day meets | 114400.0 |
| 166 | Sales | Area Sales Manager | Exceeds | 114400.0 |
| 167 | Sales | Area Sales Manager | Exceeds | 114400.0 |
| 168 | Sales | Area Sales Manager | Fully Meets | 114400.0 |
| 169 | Sales | Area Sales Manager | Fully Meets | 114400.0 |
| 170 | Sales | Area Sales Manager | Fully Meets | 114400.0 |
| 171 | Sales | Area Sales Manager | Fully Meets | 114400.0 |
| 172 | Sales | Area Sales Manager | Fully Meets | 114400.0 |
| 173 | Sales | Area Sales Manager | Fully Meets | 114400.0 |
| 174 | Sales | Area Sales Manager | Fully Meets | 114400.0 |
| 175 | Sales | Area Sales Manager | Fully Meets | 114400.0 |
| 176 | Sales | Area Sales Manager | Fully Meets | 114400.0 |
| 177 | Sales | Area Sales Manager | N/A- too early to review | 114400.0 |
| 178 | Sales | Area Sales Manager | Needs Improvement | 114400.0 |
| 179 | Sales | Area Sales Manager | PIP | 114400.0 |
| 180 | Sales | Area Sales Manager | Fully Meets | 115440.0 |
| 181 | Sales | Area Sales Manager | Fully Meets | 115440.0 |
| 182 | Sales | Area Sales Manager | Fully Meets | 116480.0 |
| 183 | Sales | Area Sales Manager | Fully Meets | 116480.0 |
| 184 | Sales | Area Sales Manager | Fully Meets | 116480.0 |
| 185 | Sales | Area Sales Manager | PIP | 116480.0 |
| 186 | Sales | Area Sales Manager | Fully Meets | 118560.0 |
| 187 | Sales | Sales Manager | Fully Meets | 112320.0 |
| 188 | Sales | Sales Manager | Needs Improvement | 116480.0 |
| 189 | Software Engineering | Software Engineer | Fully Meets | 99008.0 |
| 190 | Software Engineering | Software Engineer | Exceptional | 99840.0 |
| 191 | Software Engineering | Software Engineer | Fully Meets | 102440.0 |
| 192 | Software Engineering | Software Engineer | Fully Meets | 115460.8 |
| 193 | Software Engineering | Software Engineer | 90-day meets | 116480.0 |
| 194 | Software Engineering | Software Engineer | 90-day meets | 118809.6 |
# Создадим сетку графиков зависимости зарплат от оценки произаводительности
# сотрудников в разрезе должностей и департаментов
g=sns.catplot(
kind="strip",
x="position",
y="USD per Year",
hue="Performance Score",
data=dfg_salary_over_performance_position,
col="department",
col_wrap=3,
height=4.5,
aspect = 0.8,
palette="tab10",
margin_titles=True,
sharex=False,
sharey=False,
marker='D',
s=10,
jitter=True,
alpha=0.9)
g.fig.suptitle(
"Зависимость ставок заработной платы от оценки производительности сотрудников для сопостовимых должностей",
fontsize=16, x=0.50, y=1.03)
# Определим размер подписи оси X
g.set_xlabels(fontsize=10)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=10)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=90,fontsize=9)
# Изменим интервал между индивидуальными графиками
g.fig.subplots_adjust(hspace=0.7)
plt.show()
ВЫВОД
При рассмотрении зависимости заработной платы от оценки производительности сотрудников ожидаемой прямой зависимости не выявлено. В некоторых случаях (особенно в IT/IS) зависимость оказалась даже обратной. Из этого можно сделать вывод, что зависимости между этими показателями нет.
sql_quiery = \
"""
-- Создадим запрос и датафрейм с источнка найма и зарплатными ставками по должностям,
-- на которых работают несколько сотрудников
-- Используем его для построения сетки графиков
-- Для этого создадим временное представление с номерами сотрудников, источнком найма и ставками
CREATE OR REPLACE TEMP VIEW
EmployeesSourcePayrate AS
SELECT
"Employee Number",
"Employee Source",
"Pay Rate"
FROM
hr_dataset
;
SELECT
department,
position,
"Employee Source",
ROUND("Pay Rate"::numeric * 2080, 2) AS "USD per Year"
FROM
PositionsWithMultActiveEmployees
LEFT JOIN
EmployeesSourcePayrate
USING ("Employee Number")
ORDER BY
department,
position,
"USD per Year",
"Employee Source"
;
"""
dfg_salary_over_source_position = pd.read_sql(sql_quiery, conn)
dfg_salary_over_source_position
| department | position | Employee Source | USD per Year | |
|---|---|---|---|---|
| 0 | Admin Offices | Accountant I | Website Banner Ads | 47840.0 |
| 1 | Admin Offices | Accountant I | Diversity Job Fair | 59280.0 |
| 2 | Admin Offices | Accountant I | Internet Search | 60320.0 |
| 3 | Admin Offices | Administrative Assistant | Website Banner Ads | 34444.8 |
| 4 | Admin Offices | Administrative Assistant | Pay Per Click - Google | 44720.0 |
| 5 | Admin Offices | Sr. Accountant | Diversity Job Fair | 72696.0 |
| 6 | Admin Offices | Sr. Accountant | Other | 72696.0 |
| 7 | IT/IS | BI Developer | Indeed | 93600.0 |
| 8 | IT/IS | BI Developer | Indeed | 93600.0 |
| 9 | IT/IS | BI Developer | Indeed | 93600.0 |
| 10 | IT/IS | BI Developer | Indeed | 95680.0 |
| 11 | IT/IS | Database Administrator | Employee Referral | 62816.0 |
| 12 | IT/IS | Database Administrator | Employee Referral | 65312.0 |
| 13 | IT/IS | Database Administrator | Glassdoor | 70720.0 |
| 14 | IT/IS | Database Administrator | Diversity Job Fair | 73840.0 |
| 15 | IT/IS | Database Administrator | Employee Referral | 82264.0 |
| 16 | IT/IS | Database Administrator | Glassdoor | 83200.0 |
| 17 | IT/IS | Database Administrator | Pay Per Click - Google | 87776.0 |
| 18 | IT/IS | Database Administrator | Employee Referral | 88920.0 |
| 19 | IT/IS | IT Support | Diversity Job Fair | 54080.0 |
| 20 | IT/IS | IT Support | Information Session | 57179.2 |
| 21 | IT/IS | IT Support | Glassdoor | 60299.2 |
| 22 | IT/IS | IT Support | Vendor Referral | 65312.0 |
| 23 | IT/IS | Network Engineer | Monster.com | 56160.0 |
| 24 | IT/IS | Network Engineer | Employee Referral | 76960.0 |
| 25 | IT/IS | Network Engineer | Employee Referral | 81120.0 |
| 26 | IT/IS | Network Engineer | Information Session | 87360.0 |
| 27 | IT/IS | Network Engineer | Vendor Referral | 89440.0 |
| 28 | IT/IS | Network Engineer | Glassdoor | 93600.0 |
| 29 | IT/IS | Network Engineer | Vendor Referral | 97760.0 |
| 30 | IT/IS | Network Engineer | Employee Referral | 102128.0 |
| 31 | IT/IS | Senior BI Developer | Indeed | 104520.0 |
| 32 | IT/IS | Senior BI Developer | Indeed | 106080.0 |
| 33 | IT/IS | Senior BI Developer | Indeed | 108680.0 |
| 34 | IT/IS | Sr. Network Engineer | Employee Referral | 110240.0 |
| 35 | IT/IS | Sr. Network Engineer | On-campus Recruiting | 111904.0 |
| 36 | IT/IS | Sr. Network Engineer | Vendor Referral | 112528.0 |
| 37 | IT/IS | Sr. Network Engineer | Employee Referral | 114816.0 |
| 38 | IT/IS | Sr. Network Engineer | Employee Referral | 116896.0 |
| 39 | Production | Production Manager | Search Engine - Google Bing Yahoo | 106080.0 |
| 40 | Production | Production Manager | Vendor Referral | 108160.0 |
| 41 | Production | Production Manager | Internet Search | 110240.0 |
| 42 | Production | Production Manager | Monster.com | 110240.0 |
| 43 | Production | Production Manager | Employee Referral | 112320.0 |
| 44 | Production | Production Manager | Employee Referral | 113360.0 |
| 45 | Production | Production Manager | Billboard | 114400.0 |
| 46 | Production | Production Manager | Pay Per Click - Google | 114400.0 |
| 47 | Production | Production Manager | Website Banner Ads | 114400.0 |
| 48 | Production | Production Technician I | Glassdoor | 29120.0 |
| 49 | Production | Production Technician I | On-campus Recruiting | 29120.0 |
| 50 | Production | Production Technician I | Word of Mouth | 29120.0 |
| 51 | Production | Production Technician I | Billboard | 31200.0 |
| 52 | Production | Production Technician I | Diversity Job Fair | 31200.0 |
| 53 | Production | Production Technician I | Employee Referral | 31200.0 |
| 54 | Production | Production Technician I | MBTA ads | 31200.0 |
| 55 | Production | Production Technician I | MBTA ads | 31200.0 |
| 56 | Production | Production Technician I | Newspager/Magazine | 31200.0 |
| 57 | Production | Production Technician I | On-campus Recruiting | 31200.0 |
| 58 | Production | Production Technician I | Search Engine - Google Bing Yahoo | 31616.0 |
| 59 | Production | Production Technician I | Billboard | 32760.0 |
| 60 | Production | Production Technician I | Billboard | 33280.0 |
| 61 | Production | Production Technician I | Diversity Job Fair | 33280.0 |
| 62 | Production | Production Technician I | Employee Referral | 33280.0 |
| 63 | Production | Production Technician I | On-campus Recruiting | 33280.0 |
| 64 | Production | Production Technician I | Professional Society | 33280.0 |
| 65 | Production | Production Technician I | Professional Society | 33280.0 |
| 66 | Production | Production Technician I | Professional Society | 33280.0 |
| 67 | Production | Production Technician I | Search Engine - Google Bing Yahoo | 33280.0 |
| 68 | Production | Production Technician I | Word of Mouth | 33280.0 |
| 69 | Production | Production Technician I | Monster.com | 34840.0 |
| 70 | Production | Production Technician I | Social Networks - Facebook Twitter etc | 34860.8 |
| 71 | Production | Production Technician I | Employee Referral | 35360.0 |
| 72 | Production | Production Technician I | Glassdoor | 35360.0 |
| 73 | Production | Production Technician I | MBTA ads | 35360.0 |
| 74 | Production | Production Technician I | Newspager/Magazine | 35360.0 |
| 75 | Production | Production Technician I | Newspager/Magazine | 35360.0 |
| 76 | Production | Production Technician I | Search Engine - Google Bing Yahoo | 35360.0 |
| 77 | Production | Production Technician I | Search Engine - Google Bing Yahoo | 35360.0 |
| 78 | Production | Production Technician I | Website Banner Ads | 35360.0 |
| 79 | Production | Production Technician I | Professional Society | 37440.0 |
| 80 | Production | Production Technician I | Professional Society | 37440.0 |
| 81 | Production | Production Technician I | Billboard | 39520.0 |
| 82 | Production | Production Technician I | Employee Referral | 39520.0 |
| 83 | Production | Production Technician I | On-campus Recruiting | 39520.0 |
| 84 | Production | Production Technician I | Professional Society | 39520.0 |
| 85 | Production | Production Technician I | Social Networks - Facebook Twitter etc | 39520.0 |
| 86 | Production | Production Technician I | Vendor Referral | 39520.0 |
| 87 | Production | Production Technician I | Word of Mouth | 39520.0 |
| 88 | Production | Production Technician I | Internet Search | 40560.0 |
| 89 | Production | Production Technician I | Newspager/Magazine | 41080.0 |
| 90 | Production | Production Technician I | Diversity Job Fair | 41600.0 |
| 91 | Production | Production Technician I | Diversity Job Fair | 41600.0 |
| 92 | Production | Production Technician I | MBTA ads | 41600.0 |
| 93 | Production | Production Technician I | MBTA ads | 41600.0 |
| 94 | Production | Production Technician I | Monster.com | 41600.0 |
| 95 | Production | Production Technician I | Monster.com | 41600.0 |
| 96 | Production | Production Technician I | Newspager/Magazine | 41600.0 |
| 97 | Production | Production Technician I | Newspager/Magazine | 41600.0 |
| 98 | Production | Production Technician I | Other | 41600.0 |
| 99 | Production | Production Technician I | Professional Society | 41600.0 |
| 100 | Production | Production Technician I | Website Banner Ads | 41600.0 |
| 101 | Production | Production Technician I | Billboard | 43680.0 |
| 102 | Production | Production Technician I | Employee Referral | 43680.0 |
| 103 | Production | Production Technician I | Employee Referral | 43680.0 |
| 104 | Production | Production Technician I | On-campus Recruiting | 43680.0 |
| 105 | Production | Production Technician I | On-campus Recruiting | 43680.0 |
| 106 | Production | Production Technician I | Pay Per Click - Google | 43680.0 |
| 107 | Production | Production Technician I | Pay Per Click - Google | 43680.0 |
| 108 | Production | Production Technician I | Search Engine - Google Bing Yahoo | 43680.0 |
| 109 | Production | Production Technician I | MBTA ads | 44200.0 |
| 110 | Production | Production Technician I | Billboard | 45760.0 |
| 111 | Production | Production Technician I | Diversity Job Fair | 45760.0 |
| 112 | Production | Production Technician I | Employee Referral | 45760.0 |
| 113 | Production | Production Technician I | MBTA ads | 45760.0 |
| 114 | Production | Production Technician I | Newspager/Magazine | 45760.0 |
| 115 | Production | Production Technician I | On-campus Recruiting | 45760.0 |
| 116 | Production | Production Technician I | Pay Per Click - Google | 45760.0 |
| 117 | Production | Production Technician I | Pay Per Click - Google | 45760.0 |
| 118 | Production | Production Technician I | Professional Society | 45760.0 |
| 119 | Production | Production Technician I | Search Engine - Google Bing Yahoo | 45760.0 |
| 120 | Production | Production Technician I | Search Engine - Google Bing Yahoo | 45760.0 |
| 121 | Production | Production Technician I | MBTA ads | 47840.0 |
| 122 | Production | Production Technician I | Newspager/Magazine | 47840.0 |
| 123 | Production | Production Technician I | On-campus Recruiting | 47840.0 |
| 124 | Production | Production Technician I | Billboard | 49920.0 |
| 125 | Production | Production Technician I | MBTA ads | 49920.0 |
| 126 | Production | Production Technician I | Monster.com | 49920.0 |
| 127 | Production | Production Technician I | On-campus Recruiting | 49920.0 |
| 128 | Production | Production Technician I | Social Networks - Facebook Twitter etc | 49920.0 |
| 129 | Production | Production Technician I | Word of Mouth | 49920.0 |
| 130 | Production | Production Technician I | Employee Referral | 50960.0 |
| 131 | Production | Production Technician I | Employee Referral | 51480.0 |
| 132 | Production | Production Technician II | Employee Referral | 45760.0 |
| 133 | Production | Production Technician II | Information Session | 45760.0 |
| 134 | Production | Production Technician II | Newspager/Magazine | 45760.0 |
| 135 | Production | Production Technician II | Pay Per Click - Google | 45760.0 |
| 136 | Production | Production Technician II | Word of Mouth | 45760.0 |
| 137 | Production | Production Technician II | MBTA ads | 46800.0 |
| 138 | Production | Production Technician II | Vendor Referral | 47840.0 |
| 139 | Production | Production Technician II | Newspager/Magazine | 49920.0 |
| 140 | Production | Production Technician II | Newspager/Magazine | 49920.0 |
| 141 | Production | Production Technician II | Pay Per Click - Google | 49920.0 |
| 142 | Production | Production Technician II | Professional Society | 50440.0 |
| 143 | Production | Production Technician II | Vendor Referral | 50440.0 |
| 144 | Production | Production Technician II | Billboard | 52000.0 |
| 145 | Production | Production Technician II | Employee Referral | 52000.0 |
| 146 | Production | Production Technician II | Newspager/Magazine | 52000.0 |
| 147 | Production | Production Technician II | Professional Society | 52000.0 |
| 148 | Production | Production Technician II | Vendor Referral | 52000.0 |
| 149 | Production | Production Technician II | Careerbuilder | 54080.0 |
| 150 | Production | Production Technician II | Diversity Job Fair | 54080.0 |
| 151 | Production | Production Technician II | MBTA ads | 54080.0 |
| 152 | Production | Production Technician II | Newspager/Magazine | 54080.0 |
| 153 | Production | Production Technician II | Glassdoor | 54288.0 |
| 154 | Production | Production Technician II | Glassdoor | 54891.2 |
| 155 | Production | Production Technician II | Employee Referral | 56160.0 |
| 156 | Production | Production Technician II | Monster.com | 56160.0 |
| 157 | Production | Production Technician II | Professional Society | 56160.0 |
| 158 | Production | Production Technician II | Professional Society | 56160.0 |
| 159 | Production | Production Technician II | Word of Mouth | 56160.0 |
| 160 | Production | Production Technician II | Other | 58240.0 |
| 161 | Production | Production Technician II | Monster.com | 60320.0 |
| 162 | Production | Production Technician II | On-campus Recruiting | 60320.0 |
| 163 | Sales | Area Sales Manager | Professional Society | 112320.0 |
| 164 | Sales | Area Sales Manager | Billboard | 114400.0 |
| 165 | Sales | Area Sales Manager | Billboard | 114400.0 |
| 166 | Sales | Area Sales Manager | Employee Referral | 114400.0 |
| 167 | Sales | Area Sales Manager | Internet Search | 114400.0 |
| 168 | Sales | Area Sales Manager | Monster.com | 114400.0 |
| 169 | Sales | Area Sales Manager | Monster.com | 114400.0 |
| 170 | Sales | Area Sales Manager | Other | 114400.0 |
| 171 | Sales | Area Sales Manager | Other | 114400.0 |
| 172 | Sales | Area Sales Manager | Pay Per Click - Google | 114400.0 |
| 173 | Sales | Area Sales Manager | Pay Per Click - Google | 114400.0 |
| 174 | Sales | Area Sales Manager | Pay Per Click - Google | 114400.0 |
| 175 | Sales | Area Sales Manager | Website Banner Ads | 114400.0 |
| 176 | Sales | Area Sales Manager | Website Banner Ads | 114400.0 |
| 177 | Sales | Area Sales Manager | Website Banner Ads | 114400.0 |
| 178 | Sales | Area Sales Manager | Website Banner Ads | 114400.0 |
| 179 | Sales | Area Sales Manager | Website Banner Ads | 114400.0 |
| 180 | Sales | Area Sales Manager | Diversity Job Fair | 115440.0 |
| 181 | Sales | Area Sales Manager | Monster.com | 115440.0 |
| 182 | Sales | Area Sales Manager | Employee Referral | 116480.0 |
| 183 | Sales | Area Sales Manager | Pay Per Click - Google | 116480.0 |
| 184 | Sales | Area Sales Manager | Website Banner Ads | 116480.0 |
| 185 | Sales | Area Sales Manager | Website Banner Ads | 116480.0 |
| 186 | Sales | Area Sales Manager | Pay Per Click - Google | 118560.0 |
| 187 | Sales | Sales Manager | Pay Per Click - Google | 112320.0 |
| 188 | Sales | Sales Manager | Diversity Job Fair | 116480.0 |
| 189 | Software Engineering | Software Engineer | Vendor Referral | 99008.0 |
| 190 | Software Engineering | Software Engineer | MBTA ads | 99840.0 |
| 191 | Software Engineering | Software Engineer | Pay Per Click - Google | 102440.0 |
| 192 | Software Engineering | Software Engineer | Search Engine - Google Bing Yahoo | 115460.8 |
| 193 | Software Engineering | Software Engineer | Pay Per Click - Google | 116480.0 |
| 194 | Software Engineering | Software Engineer | Monster.com | 118809.6 |
# Создадим сетку графиков зависимости зарплат от источника найма
# сотрудников в разрезе должностей и департаментов
g=sns.catplot(
kind="strip",
x="position",
y="USD per Year",
hue="Employee Source",
data=dfg_salary_over_source_position,
col="department",
col_wrap=3,
height=4.5,
aspect = 0.8,
palette="tab20",
margin_titles=True,
sharex=False,
sharey=False,
marker='D',
s=10,
jitter=True,
alpha=0.9)
g.fig.suptitle("Зависимость заработной платы от источника найма сотрудников для сопостовимых должностей",
fontsize=16, x=0.50, y=1.03)
# Определим размер подписи оси X
g.set_xlabels(fontsize=10)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=10)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=90,fontsize=9)
# Изменим интервал между индивидуальными графиками
g.fig.subplots_adjust(hspace=0.7)
plt.show()
ВЫВОД
Значимая тенденция прослеживается скорее только для департамента IT/IS.
ПРИМЕЧАНИЕ
- Здесь рассмотрим всех сотрудников, независимо от статуса занятости.
- Сравнение необходимо проводить с учётом подразделений и должностей.
- Отчасти эти данные будут дублировать данные о сроках работы в компании, но здесь интересно посмотреть именно хронологию.
- Так как нет данных о том, менялась ли заработная плата сотрудника в период работы в компании, выводы полученные здесь будут носить условный характер.
sql_quiery = \
"""
SELECT
"Employee Number",
department,
position,
ROUND("Pay Rate"::numeric * 2080, 2) AS "USD per Year",
"Date of Hire"
FROM
hr_dataset
ORDER BY
department,
position,
"Date of Hire"
"""
dfg_ = pd.read_sql(sql_quiery, conn)
dfg_
| Employee Number | department | position | USD per Year | Date of Hire | |
|---|---|---|---|---|---|
| 0 | 1103024456 | Admin Offices | Accountant I | 59280.0 | 2008-10-27 |
| 1 | 1106026572 | Admin Offices | Accountant I | 47840.0 | 2014-01-06 |
| 2 | 1302053333 | Admin Offices | Accountant I | 60320.0 | 2014-09-29 |
| 3 | 711007713 | Admin Offices | Administrative Assistant | 42640.0 | 2011-09-26 |
| 4 | 1211050782 | Admin Offices | Administrative Assistant | 44720.0 | 2015-02-16 |
| 5 | 1307059817 | Admin Offices | Administrative Assistant | 34444.8 | 2015-05-01 |
| 6 | 1206043417 | Admin Offices | Shared Services Manager | 114400.0 | 2011-02-21 |
| 7 | 1102024115 | Admin Offices | Shared Services Manager | 114400.0 | 2016-01-05 |
| 8 | 1201031308 | Admin Offices | Sr. Accountant | 72696.0 | 2009-01-05 |
| 9 | 1307060188 | Admin Offices | Sr. Accountant | 72696.0 | 2015-02-16 |
| 10 | 1001495124 | Executive Office | President & CEO | 166400.0 | 2012-07-02 |
| 11 | 1009919940 | IT/IS | BI Developer | 93600.0 | 2016-10-02 |
| 12 | 1009919980 | IT/IS | BI Developer | 95680.0 | 2017-02-15 |
| 13 | 1009920000 | IT/IS | BI Developer | 93600.0 | 2017-04-20 |
| 14 | 1009919990 | IT/IS | BI Developer | 93600.0 | 2017-04-20 |
| 15 | 1009919920 | IT/IS | BI Director | 132080.0 | 2016-09-06 |
| 16 | 1112030816 | IT/IS | CIO | 135200.0 | 2010-04-10 |
| 17 | 1009919950 | IT/IS | Data Architect | 114400.0 | 2017-01-07 |
| 18 | 1102024056 | IT/IS | Database Administrator | 89440.0 | 2014-07-07 |
| 19 | 1406068403 | IT/IS | Database Administrator | 73840.0 | 2014-11-10 |
| 20 | 1108027853 | IT/IS | Database Administrator | 88920.0 | 2014-11-10 |
| 21 | 1102023965 | IT/IS | Database Administrator | 85280.0 | 2014-12-01 |
| 22 | 1003018246 | IT/IS | Database Administrator | 83200.0 | 2015-01-05 |
| 23 | 1111030148 | IT/IS | Database Administrator | 93600.0 | 2015-01-05 |
| 24 | 808010278 | IT/IS | Database Administrator | 62816.0 | 2015-01-05 |
| 25 | 1410071156 | IT/IS | Database Administrator | 83408.0 | 2015-02-16 |
| 26 | 905013738 | IT/IS | Database Administrator | 100880.0 | 2015-02-16 |
| 27 | 1407068885 | IT/IS | Database Administrator | 82264.0 | 2015-02-16 |
| 28 | 1110029732 | IT/IS | Database Administrator | 65312.0 | 2015-03-30 |
| 29 | 1203032255 | IT/IS | Database Administrator | 87776.0 | 2015-03-30 |
| 30 | 1105025718 | IT/IS | Database Administrator | 70720.0 | 2015-03-30 |
| 31 | 1192991000 | IT/IS | IT Director | 135200.0 | 2011-04-15 |
| 32 | 1001175250 | IT/IS | IT Manager - DB | 43680.0 | 2012-01-09 |
| 33 | 1106026933 | IT/IS | IT Manager - DB | 128960.0 | 2013-01-20 |
| 34 | 1011022863 | IT/IS | IT Manager - Infra | 131040.0 | 2012-02-15 |
| 35 | 1101023754 | IT/IS | IT Manager - Support | 133120.0 | 2014-01-05 |
| 36 | 1501072093 | IT/IS | IT Support | 65312.0 | 2010-05-01 |
| 37 | 602000312 | IT/IS | IT Support | 54080.0 | 2011-01-21 |
| 38 | 1203032263 | IT/IS | IT Support | 57179.2 | 2011-06-10 |
| 39 | 1301052902 | IT/IS | IT Support | 60299.2 | 2012-09-05 |
| 40 | 906014183 | IT/IS | Network Engineer | 97760.0 | 2014-09-30 |
| 41 | 1212052023 | IT/IS | Network Engineer | 93600.0 | 2015-01-05 |
| 42 | 1988299991 | IT/IS | Network Engineer | 81120.0 | 2015-01-05 |
| 43 | 1104025466 | IT/IS | Network Engineer | 58240.0 | 2015-01-05 |
| 44 | 1101023540 | IT/IS | Network Engineer | 76960.0 | 2015-01-05 |
| 45 | 1001956578 | IT/IS | Network Engineer | 56160.0 | 2015-02-16 |
| 46 | 1102024173 | IT/IS | Network Engineer | 87360.0 | 2015-03-30 |
| 47 | 1012023013 | IT/IS | Network Engineer | 89440.0 | 2015-03-30 |
| 48 | 1411071506 | IT/IS | Network Engineer | 102128.0 | 2015-03-30 |
| 49 | 1009919930 | IT/IS | Senior BI Developer | 104520.0 | 2016-10-02 |
| 50 | 1009919960 | IT/IS | Senior BI Developer | 108680.0 | 2017-02-10 |
| 51 | 1009919970 | IT/IS | Senior BI Developer | 106080.0 | 2017-02-15 |
| 52 | 1412071562 | IT/IS | Sr. DBA | 121056.0 | 2014-02-17 |
| 53 | 1111030266 | IT/IS | Sr. DBA | 121680.0 | 2015-01-05 |
| 54 | 1307060199 | IT/IS | Sr. DBA | 128960.0 | 2015-03-30 |
| 55 | 1010022337 | IT/IS | Sr. DBA | 127504.0 | 2016-06-30 |
| 56 | 1411071312 | IT/IS | Sr. Network Engineer | 112528.0 | 2014-11-10 |
| 57 | 1308060959 | IT/IS | Sr. Network Engineer | 110240.0 | 2014-11-10 |
| 58 | 1108028108 | IT/IS | Sr. Network Engineer | 116896.0 | 2014-11-10 |
| 59 | 1301052347 | IT/IS | Sr. Network Engineer | 114816.0 | 2015-03-30 |
| 60 | 904013591 | IT/IS | Sr. Network Engineer | 111904.0 | 2016-06-30 |
| 61 | 1006020066 | Production | Director of Operations | 124800.0 | 2009-01-05 |
| 62 | 1405067298 | Production | Production Manager | 114400.0 | 2009-01-08 |
| 63 | 1000974650 | Production | Production Manager | 110240.0 | 2010-07-20 |
| 64 | 1402065355 | Production | Production Manager | 80080.0 | 2010-10-25 |
| 65 | 1001944783 | Production | Production Manager | 100880.0 | 2011-01-10 |
| 66 | 1403065874 | Production | Production Manager | 87360.0 | 2011-02-21 |
| 67 | 1501072311 | Production | Production Manager | 113360.0 | 2011-08-01 |
| 68 | 1410071026 | Production | Production Manager | 69680.0 | 2011-09-26 |
| 69 | 1107027351 | Production | Production Manager | 110240.0 | 2012-08-16 |
| 70 | 1102024149 | Production | Production Manager | 108160.0 | 2012-10-02 |
| 71 | 1103024679 | Production | Production Manager | 114400.0 | 2013-09-30 |
| 72 | 1303054580 | Production | Production Manager | 105040.0 | 2013-09-30 |
| 73 | 1409070147 | Production | Production Manager | 106080.0 | 2014-09-18 |
| 74 | 1307060077 | Production | Production Manager | 112320.0 | 2015-06-02 |
| 75 | 1110029990 | Production | Production Manager | 114400.0 | 2016-01-28 |
| 76 | 1001735072 | Production | Production Technician I | 35360.0 | 2007-11-05 |
| 77 | 1011022883 | Production | Production Technician I | 43680.0 | 2008-01-07 |
| 78 | 1001268402 | Production | Production Technician I | 45760.0 | 2008-09-02 |
| 79 | 1308060535 | Production | Production Technician I | 43680.0 | 2009-01-05 |
| 80 | 1405067501 | Production | Production Technician I | 31200.0 | 2009-04-27 |
| 81 | 1212051409 | Production | Production Technician I | 45760.0 | 2009-07-06 |
| 82 | 1307059944 | Production | Production Technician I | 35360.0 | 2010-04-26 |
| 83 | 1107027450 | Production | Production Technician I | 43680.0 | 2011-01-10 |
| 84 | 1307060058 | Production | Production Technician I | 49920.0 | 2011-01-10 |
| 85 | 1307060212 | Production | Production Technician I | 47840.0 | 2011-01-10 |
| 86 | 1111030244 | Production | Production Technician I | 29120.0 | 2011-01-10 |
| 87 | 1311062610 | Production | Production Technician I | 43680.0 | 2011-01-10 |
| 88 | 1206038000 | Production | Production Technician I | 49920.0 | 2011-01-10 |
| 89 | 1403065625 | Production | Production Technician I | 33280.0 | 2011-01-10 |
| 90 | 1103024504 | Production | Production Technician I | 41600.0 | 2011-01-10 |
| 91 | 1307060083 | Production | Production Technician I | 35360.0 | 2011-01-10 |
| 92 | 1109029531 | Production | Production Technician I | 37440.0 | 2011-02-07 |
| 93 | 1103024859 | Production | Production Technician I | 29120.0 | 2011-02-21 |
| 94 | 1107027575 | Production | Production Technician I | 31200.0 | 2011-02-21 |
| 95 | 1405067492 | Production | Production Technician I | 37440.0 | 2011-04-04 |
| 96 | 1406067865 | Production | Production Technician I | 35360.0 | 2011-04-04 |
| 97 | 1403066194 | Production | Production Technician I | 45760.0 | 2011-04-04 |
| 98 | 1102024274 | Production | Production Technician I | 39520.0 | 2011-05-16 |
| 99 | 1211050793 | Production | Production Technician I | 41600.0 | 2011-05-16 |
| 100 | 1109029186 | Production | Production Technician I | 45760.0 | 2011-05-16 |
| 101 | 1306058816 | Production | Production Technician I | 48880.0 | 2011-05-16 |
| 102 | 1101023839 | Production | Production Technician I | 43680.0 | 2011-05-16 |
| 103 | 1105026041 | Production | Production Technician I | 49920.0 | 2011-05-16 |
| 104 | 1401064327 | Production | Production Technician I | 41600.0 | 2011-05-31 |
| 105 | 1101023394 | Production | Production Technician I | 43680.0 | 2011-06-27 |
| 106 | 1305057440 | Production | Production Technician I | 37440.0 | 2011-07-05 |
| 107 | 1404066739 | Production | Production Technician I | 41600.0 | 2011-07-05 |
| 108 | 1409070522 | Production | Production Technician I | 41600.0 | 2011-07-05 |
| 109 | 1206042315 | Production | Production Technician I | 31720.0 | 2011-07-05 |
| 110 | 1001450968 | Production | Production Technician I | 31200.0 | 2011-07-11 |
| 111 | 1411071212 | Production | Production Technician I | 33280.0 | 2011-07-11 |
| 112 | 1001417624 | Production | Production Technician I | 33280.0 | 2011-07-11 |
| 113 | 1204033041 | Production | Production Technician I | 37440.0 | 2011-09-26 |
| 114 | 1011022814 | Production | Production Technician I | 43680.0 | 2011-09-26 |
| 115 | 710007401 | Production | Production Technician I | 45760.0 | 2011-09-26 |
| 116 | 903013071 | Production | Production Technician I | 49920.0 | 2011-09-26 |
| 117 | 1408069409 | Production | Production Technician I | 39520.0 | 2011-09-26 |
| 118 | 1208048229 | Production | Production Technician I | 43680.0 | 2011-09-26 |
| 119 | 1012023295 | Production | Production Technician I | 45760.0 | 2011-10-03 |
| 120 | 1308060622 | Production | Production Technician I | 45760.0 | 2011-11-07 |
| 121 | 1502072511 | Production | Production Technician I | 41600.0 | 2011-11-07 |
| 122 | 1001138521 | Production | Production Technician I | 39520.0 | 2011-11-07 |
| 123 | 1105025721 | Production | Production Technician I | 35360.0 | 2011-11-07 |
| 124 | 1405067064 | Production | Production Technician I | 49920.0 | 2011-11-28 |
| 125 | 1405067642 | Production | Production Technician I | 45760.0 | 2011-11-28 |
| 126 | 1006020020 | Production | Production Technician I | 49920.0 | 2012-01-09 |
| 127 | 1110029602 | Production | Production Technician I | 41080.0 | 2012-01-09 |
| 128 | 1304055947 | Production | Production Technician I | 47840.0 | 2012-01-09 |
| 129 | 1308060671 | Production | Production Technician I | 33280.0 | 2012-01-09 |
| 130 | 1101023679 | Production | Production Technician I | 34860.8 | 2012-02-20 |
| 131 | 803009012 | Production | Production Technician I | 47840.0 | 2012-02-20 |
| 132 | 1410071137 | Production | Production Technician I | 33280.0 | 2012-02-20 |
| 133 | 1406068293 | Production | Production Technician I | 41600.0 | 2012-04-02 |
| 134 | 1104025243 | Production | Production Technician I | 35360.0 | 2012-04-02 |
| 135 | 1409070245 | Production | Production Technician I | 29120.0 | 2012-04-02 |
| 136 | 1008020960 | Production | Production Technician I | 31200.0 | 2012-04-02 |
| 137 | 1109029366 | Production | Production Technician I | 33280.0 | 2012-04-02 |
| 138 | 1312063507 | Production | Production Technician I | 45760.0 | 2012-04-02 |
| 139 | 1202031618 | Production | Production Technician I | 34840.0 | 2012-04-02 |
| 140 | 1406068241 | Production | Production Technician I | 43680.0 | 2012-05-14 |
| 141 | 1407069061 | Production | Production Technician I | 29120.0 | 2012-05-14 |
| 142 | 1106026579 | Production | Production Technician I | 31200.0 | 2012-07-02 |
| 143 | 1001109612 | Production | Production Technician I | 31200.0 | 2012-07-02 |
| 144 | 1304055987 | Production | Production Technician I | 35360.0 | 2012-07-09 |
| 145 | 1002017900 | Production | Production Technician I | 39520.0 | 2012-08-13 |
| 146 | 1203032235 | Production | Production Technician I | 39520.0 | 2012-08-13 |
| 147 | 1101023619 | Production | Production Technician I | 31200.0 | 2012-08-13 |
| 148 | 1201031032 | Production | Production Technician I | 31200.0 | 2012-09-24 |
| 149 | 1212051962 | Production | Production Technician I | 37440.0 | 2012-09-24 |
| 150 | 1011022926 | Production | Production Technician I | 33280.0 | 2012-09-24 |
| 151 | 1201031438 | Production | Production Technician I | 41600.0 | 2012-11-05 |
| 152 | 1209048696 | Production | Production Technician I | 45760.0 | 2013-01-07 |
| 153 | 909015167 | Production | Production Technician I | 49920.0 | 2013-01-07 |
| 154 | 1409070255 | Production | Production Technician I | 49920.0 | 2013-02-18 |
| 155 | 1403066020 | Production | Production Technician I | 31200.0 | 2013-04-01 |
| 156 | 1204033041 | Production | Production Technician I | 45760.0 | 2013-04-01 |
| 157 | 1102024121 | Production | Production Technician I | 31200.0 | 2013-04-01 |
| 158 | 1405067138 | Production | Production Technician I | 43680.0 | 2013-05-13 |
| 159 | 1110029777 | Production | Production Technician I | 35360.0 | 2013-07-08 |
| 160 | 1203032357 | Production | Production Technician I | 39520.0 | 2013-07-08 |
| 161 | 1111030129 | Production | Production Technician I | 45760.0 | 2013-07-08 |
| 162 | 1011022887 | Production | Production Technician I | 35360.0 | 2013-07-08 |
| 163 | 1599991009 | Production | Production Technician I | 31200.0 | 2013-07-08 |
| 164 | 1104025414 | Production | Production Technician I | 37440.0 | 2013-07-08 |
| 165 | 1312063675 | Production | Production Technician I | 49920.0 | 2013-08-19 |
| 166 | 1412071844 | Production | Production Technician I | 45760.0 | 2013-08-19 |
| 167 | 1408069882 | Production | Production Technician I | 33280.0 | 2013-08-19 |
| 168 | 1205033102 | Production | Production Technician I | 31200.0 | 2013-08-19 |
| 169 | 1301052124 | Production | Production Technician I | 45760.0 | 2013-09-30 |
| 170 | 1503072857 | Production | Production Technician I | 43680.0 | 2013-09-30 |
| 171 | 1404066949 | Production | Production Technician I | 33280.0 | 2013-09-30 |
| 172 | 807010161 | Production | Production Technician I | 31616.0 | 2013-09-30 |
| 173 | 1301052462 | Production | Production Technician I | 39520.0 | 2013-09-30 |
| 174 | 1408069539 | Production | Production Technician I | 35360.0 | 2013-11-11 |
| 175 | 1408069635 | Production | Production Technician I | 41600.0 | 2013-11-11 |
| 176 | 1308060754 | Production | Production Technician I | 47840.0 | 2013-11-11 |
| 177 | 1501071909 | Production | Production Technician I | 50960.0 | 2013-11-11 |
| 178 | 1209049259 | Production | Production Technician I | 35360.0 | 2014-01-06 |
| 179 | 1412071713 | Production | Production Technician I | 39520.0 | 2014-01-06 |
| 180 | 710007555 | Production | Production Technician I | 35360.0 | 2014-01-06 |
| 181 | 1401064562 | Production | Production Technician I | 33280.0 | 2014-01-06 |
| 182 | 1107027392 | Production | Production Technician I | 37440.0 | 2014-02-17 |
| 183 | 1106026896 | Production | Production Technician I | 45760.0 | 2014-02-17 |
| 184 | 1103024335 | Production | Production Technician I | 45760.0 | 2014-02-17 |
| 185 | 1304055683 | Production | Production Technician I | 29120.0 | 2014-02-17 |
| 186 | 1012023152 | Production | Production Technician I | 45760.0 | 2014-02-17 |
| 187 | 1311063114 | Production | Production Technician I | 41600.0 | 2014-03-31 |
| 188 | 1206044851 | Production | Production Technician I | 38480.0 | 2014-03-31 |
| 189 | 1101023612 | Production | Production Technician I | 43680.0 | 2014-03-31 |
| 190 | 1404066622 | Production | Production Technician I | 41600.0 | 2014-05-12 |
| 191 | 706006285 | Production | Production Technician I | 43680.0 | 2014-05-12 |
| 192 | 1302053044 | Production | Production Technician I | 43680.0 | 2014-05-12 |
| 193 | 1305057282 | Production | Production Technician I | 40560.0 | 2014-05-12 |
| 194 | 1501072124 | Production | Production Technician I | 41600.0 | 2014-07-07 |
| 195 | 1308060366 | Production | Production Technician I | 33280.0 | 2014-07-07 |
| 196 | 1204032927 | Production | Production Technician I | 33280.0 | 2014-09-29 |
| 197 | 1211051232 | Production | Production Technician I | 31200.0 | 2014-09-29 |
| 198 | 1302053362 | Production | Production Technician I | 43680.0 | 2014-09-29 |
| 199 | 1208048062 | Production | Production Technician I | 33280.0 | 2014-09-29 |
| 200 | 1007020403 | Production | Production Technician I | 44200.0 | 2014-11-10 |
| 201 | 1403066069 | Production | Production Technician I | 32760.0 | 2015-01-05 |
| 202 | 1201031310 | Production | Production Technician I | 39520.0 | 2015-01-05 |
| 203 | 1101023353 | Production | Production Technician I | 41600.0 | 2015-02-16 |
| 204 | 1309061015 | Production | Production Technician I | 39520.0 | 2015-03-30 |
| 205 | 1109029256 | Production | Production Technician I | 41600.0 | 2015-03-30 |
| 206 | 1501072192 | Production | Production Technician I | 39520.0 | 2015-03-30 |
| 207 | 1302053339 | Production | Production Technician I | 37440.0 | 2015-05-11 |
| 208 | 1106026474 | Production | Production Technician I | 41600.0 | 2015-07-05 |
| 209 | 1410070998 | Production | Production Technician I | 41600.0 | 2016-07-04 |
| 210 | 1407069280 | Production | Production Technician I | 51480.0 | 2016-07-06 |
| 211 | 1311063172 | Production | Production Technician I | 41080.0 | 2016-07-06 |
| 212 | 1011022777 | Production | Production Technician II | 47840.0 | 2007-06-25 |
| 213 | 1012023103 | Production | Production Technician II | 47840.0 | 2009-10-26 |
| 214 | 1411071324 | Production | Production Technician II | 60320.0 | 2010-04-26 |
| 215 | 1209048697 | Production | Production Technician II | 54080.0 | 2010-08-30 |
| 216 | 1106026462 | Production | Production Technician II | 60320.0 | 2010-08-30 |
| 217 | 1008021030 | Production | Production Technician II | 54080.0 | 2011-01-10 |
| 218 | 1107027551 | Production | Production Technician II | 45760.0 | 2011-01-10 |
| 219 | 1011022932 | Production | Production Technician II | 49920.0 | 2011-01-10 |
| 220 | 1306058509 | Production | Production Technician II | 47840.0 | 2011-02-21 |
| 221 | 1012023204 | Production | Production Technician II | 49920.0 | 2011-04-04 |
| 222 | 1403066125 | Production | Production Technician II | 56160.0 | 2011-04-04 |
| 223 | 1301052449 | Production | Production Technician II | 54080.0 | 2011-04-04 |
| 224 | 1104025179 | Production | Production Technician II | 60320.0 | 2011-04-04 |
| 225 | 1012023226 | Production | Production Technician II | 52000.0 | 2011-05-16 |
| 226 | 1406068345 | Production | Production Technician II | 59800.0 | 2011-05-16 |
| 227 | 1411071406 | Production | Production Technician II | 60320.0 | 2011-05-16 |
| 228 | 1109029103 | Production | Production Technician II | 60320.0 | 2011-05-16 |
| 229 | 1202031821 | Production | Production Technician II | 58240.0 | 2011-07-05 |
| 230 | 1008020942 | Production | Production Technician II | 46800.0 | 2011-07-05 |
| 231 | 1005019209 | Production | Production Technician II | 60320.0 | 2011-07-05 |
| 232 | 1499902991 | Production | Production Technician II | 45760.0 | 2011-07-05 |
| 233 | 1304055986 | Production | Production Technician II | 47840.0 | 2011-07-05 |
| 234 | 1205033439 | Production | Production Technician II | 52000.0 | 2011-08-15 |
| 235 | 1207046956 | Production | Production Technician II | 58240.0 | 2011-08-15 |
| 236 | 1402065340 | Production | Production Technician II | 54080.0 | 2011-09-26 |
| 237 | 1201031274 | Production | Production Technician II | 52000.0 | 2011-11-07 |
| 238 | 1404066711 | Production | Production Technician II | 56160.0 | 2011-11-07 |
| 239 | 1408069503 | Production | Production Technician II | 54080.0 | 2012-01-09 |
| 240 | 1011022820 | Production | Production Technician II | 52000.0 | 2012-03-05 |
| 241 | 1104025486 | Production | Production Technician II | 58240.0 | 2012-04-02 |
| 242 | 1103024843 | Production | Production Technician II | 54080.0 | 2012-04-02 |
| 243 | 1001856521 | Production | Production Technician II | 52000.0 | 2012-05-14 |
| 244 | 1405067188 | Production | Production Technician II | 60320.0 | 2013-01-07 |
| 245 | 1010022030 | Production | Production Technician II | 45760.0 | 2013-01-07 |
| 246 | 1001103149 | Production | Production Technician II | 52000.0 | 2013-05-13 |
| 247 | 1301052436 | Production | Production Technician II | 60320.0 | 2013-05-13 |
| 248 | 1305056276 | Production | Production Technician II | 49920.0 | 2013-07-08 |
| 249 | 1405067565 | Production | Production Technician II | 45760.0 | 2013-07-08 |
| 250 | 1001504432 | Production | Production Technician II | 54288.0 | 2013-08-19 |
| 251 | 1108028351 | Production | Production Technician II | 56160.0 | 2013-09-30 |
| 252 | 1307059937 | Production | Production Technician II | 49920.0 | 2013-11-11 |
| 253 | 1402065085 | Production | Production Technician II | 49920.0 | 2014-02-17 |
| 254 | 1001549006 | Production | Production Technician II | 50440.0 | 2014-05-12 |
| 255 | 1105025661 | Production | Production Technician II | 49920.0 | 2014-07-07 |
| 256 | 1306057810 | Production | Production Technician II | 52000.0 | 2014-07-07 |
| 257 | 1108028428 | Production | Production Technician II | 56160.0 | 2014-07-07 |
| 258 | 1012023010 | Production | Production Technician II | 50440.0 | 2014-07-07 |
| 259 | 1011022818 | Production | Production Technician II | 45760.0 | 2014-08-18 |
| 260 | 1001970770 | Production | Production Technician II | 45760.0 | 2014-09-29 |
| 261 | 1101023457 | Production | Production Technician II | 45760.0 | 2014-09-29 |
| 262 | 1205033180 | Production | Production Technician II | 45760.0 | 2014-09-29 |
| 263 | 1104025435 | Production | Production Technician II | 54891.2 | 2014-11-10 |
| 264 | 1406067957 | Production | Production Technician II | 54080.0 | 2015-03-30 |
| 265 | 1103024924 | Production | Production Technician II | 58240.0 | 2015-06-05 |
| 266 | 1110029623 | Production | Production Technician II | 47840.0 | 2016-05-11 |
| 267 | 1106026433 | Production | Production Technician II | 52000.0 | 2016-06-06 |
| 268 | 1303054329 | Production | Production Technician II | 56160.0 | 2016-07-21 |
| 269 | 1502072711 | Sales | Area Sales Manager | 114400.0 | 2006-01-09 |
| 270 | 1411071295 | Sales | Area Sales Manager | 112320.0 | 2010-09-27 |
| 271 | 1409070567 | Sales | Area Sales Manager | 114400.0 | 2011-01-10 |
| 272 | 1001167253 | Sales | Area Sales Manager | 114400.0 | 2011-03-07 |
| 273 | 1204032843 | Sales | Area Sales Manager | 115440.0 | 2011-03-07 |
| 274 | 1312063714 | Sales | Area Sales Manager | 114400.0 | 2011-07-05 |
| 275 | 1504073368 | Sales | Area Sales Manager | 114400.0 | 2011-08-15 |
| 276 | 1411071302 | Sales | Area Sales Manager | 114400.0 | 2011-09-06 |
| 277 | 1102024106 | Sales | Area Sales Manager | 114400.0 | 2012-01-09 |
| 278 | 1408069481 | Sales | Area Sales Manager | 114400.0 | 2012-02-20 |
| 279 | 1111030503 | Sales | Area Sales Manager | 116480.0 | 2012-03-05 |
| 280 | 1412071660 | Sales | Area Sales Manager | 114400.0 | 2012-04-30 |
| 281 | 1209048771 | Sales | Area Sales Manager | 116480.0 | 2012-05-14 |
| 282 | 1111030684 | Sales | Area Sales Manager | 114400.0 | 2013-07-08 |
| 283 | 1104025008 | Sales | Area Sales Manager | 114400.0 | 2013-08-19 |
| 284 | 1501072180 | Sales | Area Sales Manager | 118560.0 | 2013-09-30 |
| 285 | 1411071481 | Sales | Area Sales Manager | 115440.0 | 2014-05-12 |
| 286 | 1001084890 | Sales | Area Sales Manager | 116480.0 | 2014-05-12 |
| 287 | 1302053046 | Sales | Area Sales Manager | 114400.0 | 2014-07-07 |
| 288 | 1403065721 | Sales | Area Sales Manager | 114400.0 | 2014-08-18 |
| 289 | 1306059197 | Sales | Area Sales Manager | 116480.0 | 2014-08-18 |
| 290 | 1504073313 | Sales | Area Sales Manager | 114400.0 | 2014-09-29 |
| 291 | 1306057978 | Sales | Area Sales Manager | 114400.0 | 2014-09-29 |
| 292 | 1401064637 | Sales | Area Sales Manager | 114400.0 | 2014-09-29 |
| 293 | 812011761 | Sales | Area Sales Manager | 114400.0 | 2015-01-05 |
| 294 | 1203032099 | Sales | Area Sales Manager | 114400.0 | 2015-02-16 |
| 295 | 1209049326 | Sales | Area Sales Manager | 114400.0 | 2016-07-06 |
| 296 | 1009021646 | Sales | Director of Sales | 124800.0 | 2014-05-05 |
| 297 | 1109029264 | Sales | Sales Manager | 125320.0 | 2011-11-07 |
| 298 | 1402065303 | Sales | Sales Manager | 112320.0 | 2014-05-05 |
| 299 | 1499902910 | Sales | Sales Manager | 116480.0 | 2014-05-18 |
| 300 | 1401064670 | Software Engineering | Software Engineer | 100880.0 | 2011-05-02 |
| 301 | 1112030979 | Software Engineering | Software Engineer | 108680.0 | 2011-11-07 |
| 302 | 1203032498 | Software Engineering | Software Engineer | 118809.6 | 2012-01-09 |
| 303 | 1012023185 | Software Engineering | Software Engineer | 102440.0 | 2012-11-05 |
| 304 | 1102024057 | Software Engineering | Software Engineer | 94473.6 | 2013-02-18 |
| 305 | 1101023577 | Software Engineering | Software Engineer | 116480.0 | 2013-11-11 |
| 306 | 1303054625 | Software Engineering | Software Engineer | 115460.8 | 2013-11-11 |
| 307 | 1201031324 | Software Engineering | Software Engineer | 99840.0 | 2014-07-07 |
| 308 | 1107027358 | Software Engineering | Software Engineer | 99008.0 | 2014-11-10 |
| 309 | 1001644719 | Software Engineering | Software Engineering Manager | 56160.0 | 2011-08-15 |
# Создадим сетку графиков зависимости зарплат от даты найма
# в разрезе должностей и департаментов
g=sns.relplot(
x="Date of Hire",
y="USD per Year",
data=dfg_,
col="department",
hue="position",
col_wrap=3,
height=4,
aspect = 1,
palette="tab10", # цветов меньше, чем должностей, но так как должности разбиты по департаментам, читать график легче
kind='line',
marker='s',
facet_kws=dict(margin_titles=True, sharex=True, sharey=True),
)
g.fig.suptitle("Зависимость годовой заработной платы от даты найма в разрезе подразделений и должностей",
fontsize=16, x=0.50, y=1.03)
# Определим размер подписи оси X
g.set_xlabels(fontsize=10)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=10)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=90,fontsize=9)
# Изменим интервал между индивидуальными графиками
g.fig.subplots_adjust(hspace=0.15)
plt.show()
ВЫВОД
ПРИМЕЧАНИЕ
Этот расчёт по причинам, изложенным выше, вряд ли показывает реальную картину. Он создан по принципу "как если бы данные о расходах на наём сотрудников имели отношение ко всему периоду, охваченному реестром персонала". Если использовать реальные данные, то результаты такого расчёта будут верными.
sql_quiery = \
"""
WITH
empl_source_salary_count AS
(SELECT
"Employee Source" AS empl_source,
COUNT("Employee Number"),
ROUND(AVG("Pay Rate")::numeric *2080, 2) AS "Average Salary",
ROUND(PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY "Pay Rate")::numeric * 2080, 2) AS "Median Salary"
FROM hr_dataset
GROUP BY
"Employee Source"
),
empl_source_price AS
(SELECT
"Employment Source" AS empl_source,
"Total"
FROM
recruiting_costs
),
empl_count_per_source_price AS
(SELECT
*
FROM
(empl_source_salary_count
LEFT JOIN
empl_source_price
USING (empl_source)
)
)
SELECT
empl_source AS "Recruitment Source",
count AS "Employee Count",
"Total" AS "Total per Source",
ROUND("Total" / (SUM(count) OVER (PARTITION BY empl_source)), 2) AS "Per Employee",
"Average Salary",
"Median Salary"
FROM
empl_count_per_source_price
ORDER BY
"Per Employee" DESC
"""
df_recruiting_costs_per_employee_salary = pd.read_sql(sql_quiery, conn)
df_recruiting_costs_per_employee_salary
| Recruitment Source | Employee Count | Total per Source | Per Employee | Average Salary | Median Salary | |
|---|---|---|---|---|---|---|
| 0 | Indeed | 8 | NaN | NaN | 101270.00 | 100100.0 |
| 1 | Careerbuilder | 1 | 7790.0 | 7790.00 | 54080.00 | 54080.0 |
| 2 | Pay Per Click | 1 | 1323.0 | 1323.00 | 31200.00 | 31200.0 |
| 3 | MBTA ads | 17 | 10980.0 | 645.88 | 50990.59 | 45760.0 |
| 4 | On-campus Recruiting | 12 | 7500.0 | 625.00 | 48845.33 | 44720.0 |
| 5 | Website Banner Ads | 13 | 7143.0 | 549.46 | 86329.60 | 114400.0 |
| 6 | Social Networks - Facebook Twitter etc | 11 | 5573.0 | 506.64 | 56161.89 | 41600.0 |
| 7 | Newspager/Magazine | 18 | 8291.0 | 460.61 | 49660.00 | 47840.0 |
| 8 | Other | 9 | 3995.0 | 443.89 | 79028.44 | 72696.0 |
| 9 | Billboard | 16 | 6192.0 | 387.00 | 62562.50 | 47320.0 |
| 10 | Diversity Job Fair | 29 | 10021.0 | 345.55 | 59754.81 | 45760.0 |
| 11 | Monster.com | 24 | 5760.0 | 240.00 | 65703.73 | 53040.0 |
| 12 | Search Engine - Google Bing Yahoo | 25 | 5183.0 | 207.32 | 51955.07 | 45760.0 |
| 13 | Pay Per Click - Google | 21 | 3509.0 | 167.10 | 83294.10 | 87776.0 |
| 14 | Professional Society | 20 | 1200.0 | 60.00 | 64922.00 | 48100.0 |
| 15 | Employee Referral | 31 | 0.0 | 0.00 | 75400.00 | 76960.0 |
| 16 | On-line Web application | 1 | 0.0 | 0.00 | 37440.00 | 37440.0 |
| 17 | Information Session | 4 | 0.0 | 0.00 | 57454.80 | 51469.6 |
| 18 | Internet Search | 6 | 0.0 | 0.00 | 72886.67 | 70200.0 |
| 19 | Vendor Referral | 15 | 0.0 | 0.00 | 84773.87 | 93600.0 |
| 20 | Company Intranet - Partner | 1 | 0.0 | 0.00 | 128960.00 | 128960.0 |
| 21 | Glassdoor | 14 | 0.0 | 0.00 | 58237.03 | 54589.6 |
| 22 | Word of Mouth | 13 | 0.0 | 0.00 | 43360.00 | 43680.0 |
dfg_recruiting_costs_per_employee_salary = df_recruiting_costs_per_employee_salary.dropna()
labels=dfg_recruiting_costs_per_employee_salary["Recruitment Source"].tolist()
# Установим параметры matplotlib для графиков
fig, (ax1, ax2, ax3) = plt.subplots(3, 1, figsize=(15, 8), sharex=True)
fig.suptitle("Стоимость найма в сравнении со средней и медианной зарплатой (USD)", fontsize=16, y=0.925)
# Выведем график стоимости найма одного сотрудника в зависимости от истоичника найма
sns.barplot(x="Recruitment Source",
y="Per Employee",
data=dfg_recruiting_costs_per_employee_salary,
palette="winter",
ax=ax1)
ax1.set_ylabel("USD per Employee", fontsize=10)
ax1.set_yscale('log') # для читаемости установим логарифмическую шкалу
ax1.set_xlabel(None) # уберём подпись под осью X для графика ax1
# Выведем график средней годовой зарплаты сотрудников в зависимости от истоичника найма
sns.barplot(x="Recruitment Source",
y="Average Salary",
data=dfg_recruiting_costs_per_employee_salary,
palette="cool",
ax=ax2)
ax2.set_ylabel("Average Salary per Year, USD", fontsize=10)
ax2.set_xlabel(None) # уберём подпись под осью X для графика ax2
# Выведем график медианы годовой зарплаты сотрудников в зависимости от истоичника найма
sns.barplot(x="Recruitment Source",
y="Median Salary",
data=dfg_recruiting_costs_per_employee_salary,
palette="summer",
ax=ax3)
ax3.set_ylabel("Median Salary per Year, USD", fontsize=10)
ax3.set_xticklabels(labels = labels, rotation=90,fontsize=10) # установим шкалу X c поворотом и размер её обозначений
# Построение линии средней для расходов на наём 1 сотрудника на основе принимаемых значений y, цвет, толщина, стиль
# Средний расход на привлечение 1 сотрудника по всем источникам:
y=dfg_recruiting_costs_per_employee_salary["Total per Source"].sum()/ \
dfg_recruiting_costs_per_employee_salary["Employee Count"].sum()
ax1.axhline(y,
color='maroon',
linewidth=1.0,
linestyle='-.')
# Создание подписи к линии
ax1.annotate(
text=f"средняя стоимость найма одного сотрудника: USD {y:,.1f}", # Аннотация линии средней.
xy=(17, y-75), # Положение подписи в единицах шкал графика
xycoords="data", # система координат для xy задана от левого нижнего угла в долях полотна графика
horizontalalignment='center', # Выравнивание текста по горизонтали
verticalalignment='center', # Выравнивание текста по вертикали
rotation="horizontal", # Поворот подписи
color='maroon', # Цвет надписи
alpha=1,
fontsize=10)
# Построение линии медианы для расходов на наём 1 сотрудника на основе принимаемых значений y, цвет, толщина, стиль
# Рассчитаем медиану средней стоимости найма по взвешенным величинам (по количеству нанятых сотрудников)
df_calc=df_recruiting_costs_per_employee_salary[["Employee Count", "Per Employee"]].fillna(0)
df_calc=df_calc.sort_values('Employee Count')
cumsum = df_calc['Employee Count'].cumsum()
cutoff = df_calc['Employee Count'].sum() / 2.0
median = df_calc["Per Employee"]
median = median[cumsum >= cutoff]
median = median.iloc[0]
y=median
ax1.axhline(y,
color='indigo',
linewidth=1,
linestyle='--')
# Создание подписи к линии
ax1.annotate(
text=f"медианная стоимость найма одного сотрудника: USD {y:,.1f}", # Аннотация линии средней.
xy=(12, y+100), # Положение подписи в единицах шкал графика
xycoords="data", # система координат для xy задана от левого нижнего угла в долях полотна графика
horizontalalignment='center', # Выравнивание текста по горизонтали
verticalalignment='center', # Выравнивание текста по вертикали
rotation="horizontal", # Поворот подписи
color='indigo', # Цвет надписи
alpha=1,
fontsize=10)
plt.show()
ВЫВОД
ПРИМЕЧАНИЕ
KPI доступны только для подразделения Production и только для должностей Production Technician I&II.
sql_quiery = \
"""
-- Создадим запросы и датафрейм с заработной платой в годовом исчислении и KPI
CREATE OR REPLACE TEMPORARY VIEW
kpi_s AS
SELECT
"Employee Name",
"Position",
'Abutments/Hour Wk 1' AS "KPI_Name",
"Abutments/Hour Wk 1" AS "KPI_Value"
FROM
production_staff
UNION ALL
SELECT
"Employee Name",
"Position",
'Abutments/Hour Wk 2' AS "KPI_Name",
"Abutments/Hour Wk 2" AS "KPI_Value"
FROM
production_staff
UNION ALL
SELECT
"Employee Name",
"Position",
'Daily Error Rate' AS "KPI_Name",
"Daily Error Rate" AS "KPI_Value"
FROM
production_staff
UNION ALL
SELECT
"Employee Name",
"Position",
'90-day Complaints' AS "KPI_Name",
"90-day Complaints" AS "KPI_Value"
FROM
production_staff
;
WITH
Salary AS
(SELECT
"Employee Name",
"Employee Number",
ROUND("Pay Rate"::numeric * 2080, 2) AS "USD per Year"
FROM
hr_dataset
WHERE
rtrim(department, ' ') = 'Production'
)
SELECT
*
FROM
kpi_s
INNER JOIN
Salary
USING ("Employee Name")
WHERE "Position"<>'Production Manager'
ORDER BY
"Position",
"USD per Year",
"Employee Name",
"KPI_Name"
"""
dfg_salary_over_production_KPI = pd.read_sql(sql_quiery, conn)
dfg_salary_over_production_KPI
| Employee Name | Position | KPI_Name | KPI_Value | Employee Number | USD per Year | |
|---|---|---|---|---|---|---|
| 0 | Gross, Paula | Production Technician I | 90-day Complaints | 0.0 | 1103024859 | 29120.0 |
| 1 | Gross, Paula | Production Technician I | Abutments/Hour Wk 1 | 9.0 | 1103024859 | 29120.0 |
| 2 | Gross, Paula | Production Technician I | Abutments/Hour Wk 2 | 10.0 | 1103024859 | 29120.0 |
| 3 | Gross, Paula | Production Technician I | Daily Error Rate | 0.0 | 1103024859 | 29120.0 |
| 4 | Knapp, Bradley J | Production Technician I | 90-day Complaints | 0.0 | 1304055683 | 29120.0 |
| ... | ... | ... | ... | ... | ... | ... |
| 763 | Sahoo, Adil | Production Technician II | Daily Error Rate | 0.0 | 1106026462 | 60320.0 |
| 764 | Winthrop, Jordan | Production Technician II | 90-day Complaints | 0.0 | 1405067188 | 60320.0 |
| 765 | Winthrop, Jordan | Production Technician II | Abutments/Hour Wk 1 | 12.0 | 1405067188 | 60320.0 |
| 766 | Winthrop, Jordan | Production Technician II | Abutments/Hour Wk 2 | 11.0 | 1405067188 | 60320.0 |
| 767 | Winthrop, Jordan | Production Technician II | Daily Error Rate | 0.0 | 1405067188 | 60320.0 |
768 rows × 6 columns
# Список KPIs:
kpi_list=["Abutments/Hour Wk 1", "Abutments/Hour Wk 2", "90-day Complaints", "Daily Error Rate"]
# Выведем график зависимости зарплаты и KPI
g=sns.lmplot(data=dfg_salary_over_production_KPI,
x="KPI_Value",
y="USD per Year",
hue="Position",
col="KPI_Name",
col_order=kpi_list,
col_wrap=2,
palette="tab10",
height=4,
aspect=1,
facet_kws=dict(sharex=False, sharey=True)
)
g.fig.suptitle("Взаимозависимость годовой заработной платы и KPI в подразделении Production (линейная регрессия)",
fontsize=16, y=1.05)
# Определим размер подписи оси X
g.set_xlabels(fontsize=10)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=10)
plt.show()
ВЫВОД
Сначала выведем общие данные по организации
sql_quiery = \
"""
-- выберем необходимые данные из hr_dataset
CREATE OR REPLACE TEMPORARY VIEW
salary_age_schedule AS
(SELECT
age,
"Employee Number",
ROUND("Pay Rate"::numeric*2080, 2) AS "USD per Year"
FROM
hr_dataset
WHERE -- Условие, что работники действующие
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
ORDER BY
age
)
;
(SELECT
age,
COUNT("Employee Number") AS "empl_count",
SUM("USD per Year") AS total_per_year,
MIN("USD per Year") AS min_per_year,
PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY "USD per Year") AS median_per_year,
MAX("USD per Year") AS max_per_year,
ROUND(AVG("USD per Year"), 2) AS avg_per_year
FROM
salary_age_schedule
GROUP BY
age
)
UNION ALL
(SELECT
'0' AS Total,
COUNT("Employee Number") AS "empl_count",
SUM("USD per Year") AS total_per_year,
MIN("USD per Year") AS min_per_year,
PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY "USD per Year") AS median_per_year,
MAX("USD per Year") AS max_per_year,
AVG("USD per Year") AS avg_per_year
FROM
salary_age_schedule
)
"""
df_salary_over_age = pd.read_sql(sql_quiery, conn)
df_salary_over_age
| age | empl_count | total_per_year | min_per_year | median_per_year | max_per_year | avg_per_year | |
|---|---|---|---|---|---|---|---|
| 0 | 25 | 2 | 91520.0 | 35360.0 | 45760.0 | 56160.0 | 45760.00 |
| 1 | 26 | 1 | 58240.0 | 58240.0 | 58240.0 | 58240.0 | 58240.00 |
| 2 | 27 | 4 | 243360.0 | 33280.0 | 47840.0 | 114400.0 | 60840.00 |
| 3 | 28 | 9 | 763360.0 | 33280.0 | 93600.0 | 118560.0 | 84817.78 |
| 4 | 29 | 12 | 1028560.0 | 43680.0 | 94640.0 | 116480.0 | 85713.33 |
| 5 | 30 | 9 | 686025.6 | 34444.8 | 89440.0 | 115460.8 | 76225.07 |
| 6 | 31 | 17 | 1241136.0 | 34840.0 | 60320.0 | 131040.0 | 73008.00 |
| 7 | 32 | 8 | 458286.4 | 31200.0 | 52665.6 | 116896.0 | 57285.80 |
| 8 | 33 | 11 | 723840.0 | 31200.0 | 45760.0 | 116480.0 | 65803.64 |
| 9 | 34 | 12 | 694220.8 | 34860.8 | 48880.0 | 114400.0 | 57851.73 |
| 10 | 35 | 10 | 625664.0 | 29120.0 | 44720.0 | 124800.0 | 62566.40 |
| 11 | 36 | 8 | 729352.0 | 35360.0 | 108160.0 | 114400.0 | 91169.00 |
| 12 | 37 | 5 | 415480.0 | 49920.0 | 81120.0 | 135200.0 | 83096.00 |
| 13 | 38 | 10 | 726169.6 | 33280.0 | 52000.0 | 135200.0 | 72616.96 |
| 14 | 39 | 14 | 708136.0 | 31200.0 | 45760.0 | 99008.0 | 50581.14 |
| 15 | 40 | 6 | 385216.0 | 29120.0 | 47840.0 | 114816.0 | 64202.67 |
| 16 | 41 | 8 | 351000.0 | 32760.0 | 45760.0 | 54080.0 | 43875.00 |
| 17 | 42 | 7 | 574080.0 | 39520.0 | 108160.0 | 114400.0 | 82011.43 |
| 18 | 43 | 5 | 225056.0 | 31616.0 | 47840.0 | 60320.0 | 45011.20 |
| 19 | 44 | 5 | 343720.0 | 35360.0 | 44200.0 | 116480.0 | 68744.00 |
| 20 | 45 | 5 | 387899.2 | 41600.0 | 57179.2 | 128960.0 | 77579.84 |
| 21 | 46 | 1 | 132080.0 | 132080.0 | 132080.0 | 132080.0 | 132080.00 |
| 22 | 47 | 4 | 328640.0 | 33280.0 | 81120.0 | 133120.0 | 82160.00 |
| 23 | 48 | 7 | 396136.0 | 33280.0 | 52000.0 | 108680.0 | 56590.86 |
| 24 | 49 | 6 | 458640.0 | 29120.0 | 83720.0 | 115440.0 | 76440.00 |
| 25 | 50 | 2 | 156000.0 | 41600.0 | 78000.0 | 114400.0 | 78000.00 |
| 26 | 51 | 4 | 178880.0 | 31200.0 | 45760.0 | 56160.0 | 44720.00 |
| 27 | 52 | 4 | 334880.0 | 43680.0 | 83200.0 | 124800.0 | 83720.00 |
| 28 | 53 | 2 | 165360.0 | 50960.0 | 82680.0 | 114400.0 | 82680.00 |
| 29 | 54 | 3 | 199680.0 | 39520.0 | 45760.0 | 114400.0 | 66560.00 |
| 30 | 55 | 1 | 114400.0 | 114400.0 | 114400.0 | 114400.0 | 114400.00 |
| 31 | 56 | 1 | 33280.0 | 33280.0 | 33280.0 | 33280.0 | 33280.00 |
| 32 | 59 | 1 | 45760.0 | 45760.0 | 45760.0 | 45760.0 | 45760.00 |
| 33 | 63 | 2 | 281840.0 | 115440.0 | 140920.0 | 166400.0 | 140920.00 |
| 34 | 66 | 1 | 112528.0 | 112528.0 | 112528.0 | 112528.0 | 112528.00 |
| 35 | 67 | 1 | 33280.0 | 33280.0 | 33280.0 | 33280.0 | 33280.00 |
| 36 | 0 | 208 | 14431705.6 | 29120.0 | 53040.0 | 166400.0 | 69383.20 |
Чтобы определить визуально, есть ли зависимость между возрастом и медианной ставкой заработной платы, построим график линейной регрессии.
dfg_salary_over_age = df_salary_over_age[:-1]
# Выведем график зависимости возраста и возрастной медианой зарплаты
g=sns.lmplot(x="age",
y="median_per_year",
palette="Accent",
height=6,
data=df_salary_over_age)
g.fig.suptitle("Зависимость возрастной медианы зарплаты, USD в год, от возраста (линейная регрессия)",
fontsize=16, y=1.05)
# Определим размер подписи оси X
g.set_xlabels(fontsize=9)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=9)
plt.show()
Сделаем то же самое для индивидуальных значений ставки заработной платы.
sql_quiery = \
"""
SELECT
*
FROM
salary_age_schedule
ORDER BY
age
"""
df_ind_salary_over_age = pd.read_sql(sql_quiery, conn)
df_ind_salary_over_age
# Выведем график зависимости возраста и медианной ставки зарплаты
g=sns.lmplot(x="age",
y="USD per Year",
palette="Set1",
height=6,
data=df_ind_salary_over_age)
g.fig.suptitle("Зависимость индивидуальных годовых зарплат в USD от возраста (линейная регрессия)",
fontsize=16, y=1.05)
# Определим размер подписи оси X
g.set_xlabels(fontsize=9)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=9)
plt.show()
Промежуточный вывод
В целом по компании прослеживается линейная зависимость между возрастом и медианной ставкой заработной платы, с незначительным возрастающим уклоном, составляющим около 25000 долларов в год в диапазоне возраста в 42 года. Отчасти это можно объяснить тем, что руководящие должности с более высокой заработной платой занимают сотрудники среднего и старшего возраста. Важно отметить, что с увеличением возраста сильно расширяется и доверительный диапазон.
Однако, распределение индивидуальных ставок зарплаты по возрастам в целом по организации не показывает никакой зависимости и выглядит достаточно огульным.
Поэтому имеет смысл рассмотреть его применительно к зарплатам в рамках одинаковых должностей, там где на таких должностях заняты более одного сотрудника.
sql_quiery = \
"""
-- Создадим запрос и датафрейм с возрастом и зарплатными ставками по должностям, на которых работают несколько сотрудников
-- Используем его для построения сетки графиков
-- Для этого используем созданные ранее временные представления с номерами сотрудников, возрастом и ставками,
-- а также с должностями, на которых работают несколько сотрудников
SELECT
department,
position,
age,
"USD per Year"
FROM
PositionsWithMultActiveEmployees
LEFT JOIN
salary_age_schedule
USING ("Employee Number")
ORDER BY
department,
position,
age,
"USD per Year"
;
"""
dfg_salary_over_positions_age = pd.read_sql(sql_quiery, conn)
#dfg_salary_over_positions_age
# Создадим сетку графиков зависимости зарплат от возраста по должостям
g=sns.lmplot(
x="age",
y="USD per Year",
data=dfg_salary_over_positions_age,
col="department",
hue="position",
col_wrap=3,
height=4,
aspect = 1,
palette="tab20",
facet_kws=dict(margin_titles=True, sharex=True, sharey=True),
)
g.fig.suptitle("Зависимость годовой заработной платы от возраста сотрудников в разрезе подразделений и должностей",
fontsize=16, x=0.45, y=1.03)
# Определим размер подписи оси X
g.set_xlabels(fontsize=10)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=10)
plt.show()
ВЫВОД
В целом по организации зависимости заработной платы от возраста (для тех позиций, где такую зависимость возможно было бы определить) практически не прослеживается либо она очень незначительна, за исключением нескольких случаев, где она заметна как тенденция.
Ещё раз стоит подчеркнуть, что эти зависимости носят характер тенденции.
Неожиданным (по крайней мере для меня) является наличие обратной зависимости между возрастом и ставкой заработной платы. Здесь не стоит гадать о причинах такого явления, скорее службе HR надо задать вопросы, с чем это может быть связано.
sql_quiery = \
"""
-- Создадим запрос и датафрейм с признаком пола и зарплатными ставками по должностям,
-- на которых работают несколько сотрудников
-- Используем его для построения сетки графиков
-- Для этого создадим подзапрос с номерами сотрудников, полом и ставками
WITH
EmployeesSexAndPayrate AS
(SELECT
"Employee Number",
sex,
ROUND("Pay Rate"::numeric * 2080, 2) AS "USD per Year"
FROM
hr_dataset
)
SELECT
department,
position,
sex,
"USD per Year"
FROM
PositionsWithMultActiveEmployees
LEFT JOIN
EmployeesSexAndPayrate
USING ("Employee Number")
ORDER BY
department,
position,
sex,
"USD per Year"
;
"""
dfg_salary_over_sex_positions = pd.read_sql(sql_quiery, conn)
dfg_salary_over_sex_positions
| department | position | sex | USD per Year | |
|---|---|---|---|---|
| 0 | Admin Offices | Accountant I | Female | 59280.0 |
| 1 | Admin Offices | Accountant I | Male | 47840.0 |
| 2 | Admin Offices | Accountant I | Male | 60320.0 |
| 3 | Admin Offices | Administrative Assistant | Female | 34444.8 |
| 4 | Admin Offices | Administrative Assistant | Female | 44720.0 |
| 5 | Admin Offices | Sr. Accountant | Female | 72696.0 |
| 6 | Admin Offices | Sr. Accountant | Female | 72696.0 |
| 7 | IT/IS | BI Developer | Female | 93600.0 |
| 8 | IT/IS | BI Developer | Male | 93600.0 |
| 9 | IT/IS | BI Developer | Male | 93600.0 |
| 10 | IT/IS | BI Developer | Male | 95680.0 |
| 11 | IT/IS | Database Administrator | Female | 65312.0 |
| 12 | IT/IS | Database Administrator | Female | 70720.0 |
| 13 | IT/IS | Database Administrator | Female | 82264.0 |
| 14 | IT/IS | Database Administrator | Female | 83200.0 |
| 15 | IT/IS | Database Administrator | Female | 88920.0 |
| 16 | IT/IS | Database Administrator | Male | 62816.0 |
| 17 | IT/IS | Database Administrator | Male | 73840.0 |
| 18 | IT/IS | Database Administrator | Male | 87776.0 |
| 19 | IT/IS | IT Support | Female | 54080.0 |
| 20 | IT/IS | IT Support | Female | 57179.2 |
| 21 | IT/IS | IT Support | Female | 65312.0 |
| 22 | IT/IS | IT Support | Male | 60299.2 |
| 23 | IT/IS | Network Engineer | Female | 56160.0 |
| 24 | IT/IS | Network Engineer | Female | 76960.0 |
| 25 | IT/IS | Network Engineer | Female | 81120.0 |
| 26 | IT/IS | Network Engineer | Female | 97760.0 |
| 27 | IT/IS | Network Engineer | Male | 87360.0 |
| 28 | IT/IS | Network Engineer | Male | 89440.0 |
| 29 | IT/IS | Network Engineer | Male | 93600.0 |
| 30 | IT/IS | Network Engineer | Male | 102128.0 |
| 31 | IT/IS | Senior BI Developer | Female | 104520.0 |
| 32 | IT/IS | Senior BI Developer | Male | 106080.0 |
| 33 | IT/IS | Senior BI Developer | Male | 108680.0 |
| 34 | IT/IS | Sr. Network Engineer | Female | 112528.0 |
| 35 | IT/IS | Sr. Network Engineer | Female | 114816.0 |
| 36 | IT/IS | Sr. Network Engineer | Male | 110240.0 |
| 37 | IT/IS | Sr. Network Engineer | Male | 111904.0 |
| 38 | IT/IS | Sr. Network Engineer | Male | 116896.0 |
| 39 | Production | Production Manager | Female | 106080.0 |
| 40 | Production | Production Manager | Female | 108160.0 |
| 41 | Production | Production Manager | Female | 114400.0 |
| 42 | Production | Production Manager | Female | 114400.0 |
| 43 | Production | Production Manager | Male | 110240.0 |
| 44 | Production | Production Manager | Male | 110240.0 |
| 45 | Production | Production Manager | Male | 112320.0 |
| 46 | Production | Production Manager | Male | 113360.0 |
| 47 | Production | Production Manager | Male | 114400.0 |
| 48 | Production | Production Technician I | Female | 29120.0 |
| 49 | Production | Production Technician I | Female | 29120.0 |
| 50 | Production | Production Technician I | Female | 31200.0 |
| 51 | Production | Production Technician I | Female | 31200.0 |
| 52 | Production | Production Technician I | Female | 31200.0 |
| 53 | Production | Production Technician I | Female | 31200.0 |
| 54 | Production | Production Technician I | Female | 31200.0 |
| 55 | Production | Production Technician I | Female | 31616.0 |
| 56 | Production | Production Technician I | Female | 32760.0 |
| 57 | Production | Production Technician I | Female | 33280.0 |
| 58 | Production | Production Technician I | Female | 33280.0 |
| 59 | Production | Production Technician I | Female | 33280.0 |
| 60 | Production | Production Technician I | Female | 33280.0 |
| 61 | Production | Production Technician I | Female | 33280.0 |
| 62 | Production | Production Technician I | Female | 34840.0 |
| 63 | Production | Production Technician I | Female | 35360.0 |
| 64 | Production | Production Technician I | Female | 35360.0 |
| 65 | Production | Production Technician I | Female | 35360.0 |
| 66 | Production | Production Technician I | Female | 35360.0 |
| 67 | Production | Production Technician I | Female | 39520.0 |
| 68 | Production | Production Technician I | Female | 39520.0 |
| 69 | Production | Production Technician I | Female | 39520.0 |
| 70 | Production | Production Technician I | Female | 40560.0 |
| 71 | Production | Production Technician I | Female | 41080.0 |
| 72 | Production | Production Technician I | Female | 41600.0 |
| 73 | Production | Production Technician I | Female | 41600.0 |
| 74 | Production | Production Technician I | Female | 41600.0 |
| 75 | Production | Production Technician I | Female | 41600.0 |
| 76 | Production | Production Technician I | Female | 41600.0 |
| 77 | Production | Production Technician I | Female | 41600.0 |
| 78 | Production | Production Technician I | Female | 41600.0 |
| 79 | Production | Production Technician I | Female | 41600.0 |
| 80 | Production | Production Technician I | Female | 43680.0 |
| 81 | Production | Production Technician I | Female | 43680.0 |
| 82 | Production | Production Technician I | Female | 43680.0 |
| 83 | Production | Production Technician I | Female | 45760.0 |
| 84 | Production | Production Technician I | Female | 45760.0 |
| 85 | Production | Production Technician I | Female | 45760.0 |
| 86 | Production | Production Technician I | Female | 45760.0 |
| 87 | Production | Production Technician I | Female | 45760.0 |
| 88 | Production | Production Technician I | Female | 45760.0 |
| 89 | Production | Production Technician I | Female | 45760.0 |
| 90 | Production | Production Technician I | Female | 47840.0 |
| 91 | Production | Production Technician I | Female | 47840.0 |
| 92 | Production | Production Technician I | Female | 47840.0 |
| 93 | Production | Production Technician I | Female | 49920.0 |
| 94 | Production | Production Technician I | Female | 49920.0 |
| 95 | Production | Production Technician I | Female | 49920.0 |
| 96 | Production | Production Technician I | Female | 49920.0 |
| 97 | Production | Production Technician I | Female | 49920.0 |
| 98 | Production | Production Technician I | Female | 49920.0 |
| 99 | Production | Production Technician I | Female | 50960.0 |
| 100 | Production | Production Technician I | Male | 29120.0 |
| 101 | Production | Production Technician I | Male | 31200.0 |
| 102 | Production | Production Technician I | Male | 31200.0 |
| 103 | Production | Production Technician I | Male | 33280.0 |
| 104 | Production | Production Technician I | Male | 33280.0 |
| 105 | Production | Production Technician I | Male | 33280.0 |
| 106 | Production | Production Technician I | Male | 33280.0 |
| 107 | Production | Production Technician I | Male | 34860.8 |
| 108 | Production | Production Technician I | Male | 35360.0 |
| 109 | Production | Production Technician I | Male | 35360.0 |
| 110 | Production | Production Technician I | Male | 35360.0 |
| 111 | Production | Production Technician I | Male | 35360.0 |
| 112 | Production | Production Technician I | Male | 37440.0 |
| 113 | Production | Production Technician I | Male | 37440.0 |
| 114 | Production | Production Technician I | Male | 39520.0 |
| 115 | Production | Production Technician I | Male | 39520.0 |
| 116 | Production | Production Technician I | Male | 39520.0 |
| 117 | Production | Production Technician I | Male | 39520.0 |
| 118 | Production | Production Technician I | Male | 41600.0 |
| 119 | Production | Production Technician I | Male | 41600.0 |
| 120 | Production | Production Technician I | Male | 41600.0 |
| 121 | Production | Production Technician I | Male | 43680.0 |
| 122 | Production | Production Technician I | Male | 43680.0 |
| 123 | Production | Production Technician I | Male | 43680.0 |
| 124 | Production | Production Technician I | Male | 43680.0 |
| 125 | Production | Production Technician I | Male | 43680.0 |
| 126 | Production | Production Technician I | Male | 44200.0 |
| 127 | Production | Production Technician I | Male | 45760.0 |
| 128 | Production | Production Technician I | Male | 45760.0 |
| 129 | Production | Production Technician I | Male | 45760.0 |
| 130 | Production | Production Technician I | Male | 45760.0 |
| 131 | Production | Production Technician I | Male | 51480.0 |
| 132 | Production | Production Technician II | Female | 45760.0 |
| 133 | Production | Production Technician II | Female | 46800.0 |
| 134 | Production | Production Technician II | Female | 47840.0 |
| 135 | Production | Production Technician II | Female | 49920.0 |
| 136 | Production | Production Technician II | Female | 49920.0 |
| 137 | Production | Production Technician II | Female | 50440.0 |
| 138 | Production | Production Technician II | Female | 52000.0 |
| 139 | Production | Production Technician II | Female | 52000.0 |
| 140 | Production | Production Technician II | Female | 52000.0 |
| 141 | Production | Production Technician II | Female | 52000.0 |
| 142 | Production | Production Technician II | Female | 54080.0 |
| 143 | Production | Production Technician II | Female | 54080.0 |
| 144 | Production | Production Technician II | Female | 54288.0 |
| 145 | Production | Production Technician II | Female | 54891.2 |
| 146 | Production | Production Technician II | Female | 56160.0 |
| 147 | Production | Production Technician II | Female | 56160.0 |
| 148 | Production | Production Technician II | Female | 56160.0 |
| 149 | Production | Production Technician II | Female | 56160.0 |
| 150 | Production | Production Technician II | Female | 58240.0 |
| 151 | Production | Production Technician II | Male | 45760.0 |
| 152 | Production | Production Technician II | Male | 45760.0 |
| 153 | Production | Production Technician II | Male | 45760.0 |
| 154 | Production | Production Technician II | Male | 45760.0 |
| 155 | Production | Production Technician II | Male | 49920.0 |
| 156 | Production | Production Technician II | Male | 50440.0 |
| 157 | Production | Production Technician II | Male | 52000.0 |
| 158 | Production | Production Technician II | Male | 54080.0 |
| 159 | Production | Production Technician II | Male | 54080.0 |
| 160 | Production | Production Technician II | Male | 56160.0 |
| 161 | Production | Production Technician II | Male | 60320.0 |
| 162 | Production | Production Technician II | Male | 60320.0 |
| 163 | Sales | Area Sales Manager | Female | 112320.0 |
| 164 | Sales | Area Sales Manager | Female | 114400.0 |
| 165 | Sales | Area Sales Manager | Female | 114400.0 |
| 166 | Sales | Area Sales Manager | Female | 114400.0 |
| 167 | Sales | Area Sales Manager | Female | 114400.0 |
| 168 | Sales | Area Sales Manager | Female | 114400.0 |
| 169 | Sales | Area Sales Manager | Female | 114400.0 |
| 170 | Sales | Area Sales Manager | Female | 114400.0 |
| 171 | Sales | Area Sales Manager | Female | 114400.0 |
| 172 | Sales | Area Sales Manager | Female | 118560.0 |
| 173 | Sales | Area Sales Manager | Male | 114400.0 |
| 174 | Sales | Area Sales Manager | Male | 114400.0 |
| 175 | Sales | Area Sales Manager | Male | 114400.0 |
| 176 | Sales | Area Sales Manager | Male | 114400.0 |
| 177 | Sales | Area Sales Manager | Male | 114400.0 |
| 178 | Sales | Area Sales Manager | Male | 114400.0 |
| 179 | Sales | Area Sales Manager | Male | 114400.0 |
| 180 | Sales | Area Sales Manager | Male | 114400.0 |
| 181 | Sales | Area Sales Manager | Male | 115440.0 |
| 182 | Sales | Area Sales Manager | Male | 115440.0 |
| 183 | Sales | Area Sales Manager | Male | 116480.0 |
| 184 | Sales | Area Sales Manager | Male | 116480.0 |
| 185 | Sales | Area Sales Manager | Male | 116480.0 |
| 186 | Sales | Area Sales Manager | Male | 116480.0 |
| 187 | Sales | Sales Manager | Female | 112320.0 |
| 188 | Sales | Sales Manager | Male | 116480.0 |
| 189 | Software Engineering | Software Engineer | Female | 99008.0 |
| 190 | Software Engineering | Software Engineer | Female | 102440.0 |
| 191 | Software Engineering | Software Engineer | Female | 115460.8 |
| 192 | Software Engineering | Software Engineer | Female | 116480.0 |
| 193 | Software Engineering | Software Engineer | Female | 118809.6 |
| 194 | Software Engineering | Software Engineer | Male | 99840.0 |
# Создадим сетку графиков зависимости зарплат от пола сотрудников работы в разрезе должностей и департаментов
g=sns.catplot(
kind="violin",
x="position",
y="USD per Year",
hue="sex",
data=dfg_salary_over_sex_positions,
col="department",
col_wrap=3,
height=4.5,
aspect = 0.8,
palette="Pastel1",
margin_titles=True,
sharex=False,
sharey=False,
split=True
)
g.fig.suptitle("Зависимость гдовой заработной платы от пола сотрудников для сопостовимых должностей",
fontsize=16, x=0.50, y=1.03)
# Определим размер подписи оси X
g.set_xlabels(fontsize=10)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=10)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=90,fontsize=9)
# Изменим интервал между индивидуальными графиками
g.fig.subplots_adjust(hspace=0.7)
plt.show()
ВЫВОД
В целом, размеры для ставок заработной платы для мужчин и женщин применительно к рассмотренным должностям различаются.
При выявленных частных зависимостях, вряд ли можно сказать, о какой-то общей тенденции для компании в разнице заработной платы между женщинами и мужчинами. Вероятнее всего, разница в ставках заработной платы по рассмотренным должностям объясняется не различием в поле работников, а иными причинами.
sql_quiery = \
"""
-- Создадим запрос и датафрейм с признаком семейного положения и зарплатными ставками по должностям,
-- на которых работают несколько сотрудников
-- Используем его для построения сетки графиков
WITH
EmployeesMaritalAndPayrate AS
(SELECT
"Employee Number",
maritaldesc,
ROUND("Pay Rate"::numeric * 2080, 2) AS "USD per Year"
FROM
hr_dataset
)
SELECT
department,
position,
maritaldesc,
"USD per Year"
FROM
PositionsWithMultActiveEmployees
LEFT JOIN
EmployeesMaritalAndPayrate
USING ("Employee Number")
ORDER BY
department,
position,
"USD per Year",
maritaldesc
;
"""
dfg_salary_over_marital = pd.read_sql(sql_quiery, conn)
dfg_salary_over_marital
| department | position | maritaldesc | USD per Year | |
|---|---|---|---|---|
| 0 | Admin Offices | Accountant I | Divorced | 47840.0 |
| 1 | Admin Offices | Accountant I | Married | 59280.0 |
| 2 | Admin Offices | Accountant I | Single | 60320.0 |
| 3 | Admin Offices | Administrative Assistant | Single | 34444.8 |
| 4 | Admin Offices | Administrative Assistant | Married | 44720.0 |
| 5 | Admin Offices | Sr. Accountant | Married | 72696.0 |
| 6 | Admin Offices | Sr. Accountant | Married | 72696.0 |
| 7 | IT/IS | BI Developer | Married | 93600.0 |
| 8 | IT/IS | BI Developer | Married | 93600.0 |
| 9 | IT/IS | BI Developer | Married | 93600.0 |
| 10 | IT/IS | BI Developer | Single | 95680.0 |
| 11 | IT/IS | Database Administrator | Married | 62816.0 |
| 12 | IT/IS | Database Administrator | Single | 65312.0 |
| 13 | IT/IS | Database Administrator | Single | 70720.0 |
| 14 | IT/IS | Database Administrator | Divorced | 73840.0 |
| 15 | IT/IS | Database Administrator | Married | 82264.0 |
| 16 | IT/IS | Database Administrator | Married | 83200.0 |
| 17 | IT/IS | Database Administrator | Married | 87776.0 |
| 18 | IT/IS | Database Administrator | Married | 88920.0 |
| 19 | IT/IS | IT Support | Single | 54080.0 |
| 20 | IT/IS | IT Support | Married | 57179.2 |
| 21 | IT/IS | IT Support | Single | 60299.2 |
| 22 | IT/IS | IT Support | Single | 65312.0 |
| 23 | IT/IS | Network Engineer | Married | 56160.0 |
| 24 | IT/IS | Network Engineer | Married | 76960.0 |
| 25 | IT/IS | Network Engineer | Separated | 81120.0 |
| 26 | IT/IS | Network Engineer | Married | 87360.0 |
| 27 | IT/IS | Network Engineer | Single | 89440.0 |
| 28 | IT/IS | Network Engineer | Divorced | 93600.0 |
| 29 | IT/IS | Network Engineer | Married | 97760.0 |
| 30 | IT/IS | Network Engineer | Married | 102128.0 |
| 31 | IT/IS | Senior BI Developer | Single | 104520.0 |
| 32 | IT/IS | Senior BI Developer | Single | 106080.0 |
| 33 | IT/IS | Senior BI Developer | Single | 108680.0 |
| 34 | IT/IS | Sr. Network Engineer | Single | 110240.0 |
| 35 | IT/IS | Sr. Network Engineer | Married | 111904.0 |
| 36 | IT/IS | Sr. Network Engineer | Married | 112528.0 |
| 37 | IT/IS | Sr. Network Engineer | Widowed | 114816.0 |
| 38 | IT/IS | Sr. Network Engineer | Married | 116896.0 |
| 39 | Production | Production Manager | Single | 106080.0 |
| 40 | Production | Production Manager | Married | 108160.0 |
| 41 | Production | Production Manager | Divorced | 110240.0 |
| 42 | Production | Production Manager | Single | 110240.0 |
| 43 | Production | Production Manager | Divorced | 112320.0 |
| 44 | Production | Production Manager | Divorced | 113360.0 |
| 45 | Production | Production Manager | Married | 114400.0 |
| 46 | Production | Production Manager | Married | 114400.0 |
| 47 | Production | Production Manager | Single | 114400.0 |
| 48 | Production | Production Technician I | Divorced | 29120.0 |
| 49 | Production | Production Technician I | Single | 29120.0 |
| 50 | Production | Production Technician I | Single | 29120.0 |
| 51 | Production | Production Technician I | Divorced | 31200.0 |
| 52 | Production | Production Technician I | Married | 31200.0 |
| 53 | Production | Production Technician I | Married | 31200.0 |
| 54 | Production | Production Technician I | Married | 31200.0 |
| 55 | Production | Production Technician I | Separated | 31200.0 |
| 56 | Production | Production Technician I | Single | 31200.0 |
| 57 | Production | Production Technician I | Widowed | 31200.0 |
| 58 | Production | Production Technician I | Single | 31616.0 |
| 59 | Production | Production Technician I | Married | 32760.0 |
| 60 | Production | Production Technician I | Divorced | 33280.0 |
| 61 | Production | Production Technician I | Married | 33280.0 |
| 62 | Production | Production Technician I | Married | 33280.0 |
| 63 | Production | Production Technician I | Married | 33280.0 |
| 64 | Production | Production Technician I | Single | 33280.0 |
| 65 | Production | Production Technician I | Single | 33280.0 |
| 66 | Production | Production Technician I | Single | 33280.0 |
| 67 | Production | Production Technician I | Single | 33280.0 |
| 68 | Production | Production Technician I | Single | 33280.0 |
| 69 | Production | Production Technician I | Single | 34840.0 |
| 70 | Production | Production Technician I | Single | 34860.8 |
| 71 | Production | Production Technician I | Married | 35360.0 |
| 72 | Production | Production Technician I | Married | 35360.0 |
| 73 | Production | Production Technician I | Separated | 35360.0 |
| 74 | Production | Production Technician I | Single | 35360.0 |
| 75 | Production | Production Technician I | Single | 35360.0 |
| 76 | Production | Production Technician I | Single | 35360.0 |
| 77 | Production | Production Technician I | Single | 35360.0 |
| 78 | Production | Production Technician I | Widowed | 35360.0 |
| 79 | Production | Production Technician I | Married | 37440.0 |
| 80 | Production | Production Technician I | Single | 37440.0 |
| 81 | Production | Production Technician I | Married | 39520.0 |
| 82 | Production | Production Technician I | Married | 39520.0 |
| 83 | Production | Production Technician I | Separated | 39520.0 |
| 84 | Production | Production Technician I | Single | 39520.0 |
| 85 | Production | Production Technician I | Single | 39520.0 |
| 86 | Production | Production Technician I | Single | 39520.0 |
| 87 | Production | Production Technician I | Single | 39520.0 |
| 88 | Production | Production Technician I | Single | 40560.0 |
| 89 | Production | Production Technician I | Single | 41080.0 |
| 90 | Production | Production Technician I | Married | 41600.0 |
| 91 | Production | Production Technician I | Married | 41600.0 |
| 92 | Production | Production Technician I | Married | 41600.0 |
| 93 | Production | Production Technician I | Married | 41600.0 |
| 94 | Production | Production Technician I | Separated | 41600.0 |
| 95 | Production | Production Technician I | Single | 41600.0 |
| 96 | Production | Production Technician I | Single | 41600.0 |
| 97 | Production | Production Technician I | Single | 41600.0 |
| 98 | Production | Production Technician I | Single | 41600.0 |
| 99 | Production | Production Technician I | Single | 41600.0 |
| 100 | Production | Production Technician I | Single | 41600.0 |
| 101 | Production | Production Technician I | Divorced | 43680.0 |
| 102 | Production | Production Technician I | Married | 43680.0 |
| 103 | Production | Production Technician I | Married | 43680.0 |
| 104 | Production | Production Technician I | Married | 43680.0 |
| 105 | Production | Production Technician I | Married | 43680.0 |
| 106 | Production | Production Technician I | Married | 43680.0 |
| 107 | Production | Production Technician I | Married | 43680.0 |
| 108 | Production | Production Technician I | Single | 43680.0 |
| 109 | Production | Production Technician I | Single | 44200.0 |
| 110 | Production | Production Technician I | Divorced | 45760.0 |
| 111 | Production | Production Technician I | Married | 45760.0 |
| 112 | Production | Production Technician I | Married | 45760.0 |
| 113 | Production | Production Technician I | Married | 45760.0 |
| 114 | Production | Production Technician I | Single | 45760.0 |
| 115 | Production | Production Technician I | Single | 45760.0 |
| 116 | Production | Production Technician I | Single | 45760.0 |
| 117 | Production | Production Technician I | Single | 45760.0 |
| 118 | Production | Production Technician I | Single | 45760.0 |
| 119 | Production | Production Technician I | Single | 45760.0 |
| 120 | Production | Production Technician I | Widowed | 45760.0 |
| 121 | Production | Production Technician I | Married | 47840.0 |
| 122 | Production | Production Technician I | Single | 47840.0 |
| 123 | Production | Production Technician I | Single | 47840.0 |
| 124 | Production | Production Technician I | Divorced | 49920.0 |
| 125 | Production | Production Technician I | Married | 49920.0 |
| 126 | Production | Production Technician I | Married | 49920.0 |
| 127 | Production | Production Technician I | Married | 49920.0 |
| 128 | Production | Production Technician I | Married | 49920.0 |
| 129 | Production | Production Technician I | Single | 49920.0 |
| 130 | Production | Production Technician I | Single | 50960.0 |
| 131 | Production | Production Technician I | Married | 51480.0 |
| 132 | Production | Production Technician II | Married | 45760.0 |
| 133 | Production | Production Technician II | Single | 45760.0 |
| 134 | Production | Production Technician II | Single | 45760.0 |
| 135 | Production | Production Technician II | Single | 45760.0 |
| 136 | Production | Production Technician II | Single | 45760.0 |
| 137 | Production | Production Technician II | Divorced | 46800.0 |
| 138 | Production | Production Technician II | Married | 47840.0 |
| 139 | Production | Production Technician II | Separated | 49920.0 |
| 140 | Production | Production Technician II | Single | 49920.0 |
| 141 | Production | Production Technician II | Single | 49920.0 |
| 142 | Production | Production Technician II | Married | 50440.0 |
| 143 | Production | Production Technician II | Single | 50440.0 |
| 144 | Production | Production Technician II | Single | 52000.0 |
| 145 | Production | Production Technician II | Single | 52000.0 |
| 146 | Production | Production Technician II | Single | 52000.0 |
| 147 | Production | Production Technician II | Single | 52000.0 |
| 148 | Production | Production Technician II | Single | 52000.0 |
| 149 | Production | Production Technician II | Married | 54080.0 |
| 150 | Production | Production Technician II | Married | 54080.0 |
| 151 | Production | Production Technician II | Single | 54080.0 |
| 152 | Production | Production Technician II | Single | 54080.0 |
| 153 | Production | Production Technician II | Single | 54288.0 |
| 154 | Production | Production Technician II | Single | 54891.2 |
| 155 | Production | Production Technician II | Married | 56160.0 |
| 156 | Production | Production Technician II | Married | 56160.0 |
| 157 | Production | Production Technician II | Married | 56160.0 |
| 158 | Production | Production Technician II | Separated | 56160.0 |
| 159 | Production | Production Technician II | Single | 56160.0 |
| 160 | Production | Production Technician II | Separated | 58240.0 |
| 161 | Production | Production Technician II | Married | 60320.0 |
| 162 | Production | Production Technician II | Separated | 60320.0 |
| 163 | Sales | Area Sales Manager | Married | 112320.0 |
| 164 | Sales | Area Sales Manager | Married | 114400.0 |
| 165 | Sales | Area Sales Manager | Married | 114400.0 |
| 166 | Sales | Area Sales Manager | Married | 114400.0 |
| 167 | Sales | Area Sales Manager | Married | 114400.0 |
| 168 | Sales | Area Sales Manager | Separated | 114400.0 |
| 169 | Sales | Area Sales Manager | Separated | 114400.0 |
| 170 | Sales | Area Sales Manager | Single | 114400.0 |
| 171 | Sales | Area Sales Manager | Single | 114400.0 |
| 172 | Sales | Area Sales Manager | Single | 114400.0 |
| 173 | Sales | Area Sales Manager | Single | 114400.0 |
| 174 | Sales | Area Sales Manager | Single | 114400.0 |
| 175 | Sales | Area Sales Manager | Single | 114400.0 |
| 176 | Sales | Area Sales Manager | Single | 114400.0 |
| 177 | Sales | Area Sales Manager | Single | 114400.0 |
| 178 | Sales | Area Sales Manager | Single | 114400.0 |
| 179 | Sales | Area Sales Manager | Single | 114400.0 |
| 180 | Sales | Area Sales Manager | Married | 115440.0 |
| 181 | Sales | Area Sales Manager | Single | 115440.0 |
| 182 | Sales | Area Sales Manager | Married | 116480.0 |
| 183 | Sales | Area Sales Manager | Married | 116480.0 |
| 184 | Sales | Area Sales Manager | Single | 116480.0 |
| 185 | Sales | Area Sales Manager | Single | 116480.0 |
| 186 | Sales | Area Sales Manager | Single | 118560.0 |
| 187 | Sales | Sales Manager | Single | 112320.0 |
| 188 | Sales | Sales Manager | Divorced | 116480.0 |
| 189 | Software Engineering | Software Engineer | Single | 99008.0 |
| 190 | Software Engineering | Software Engineer | Single | 99840.0 |
| 191 | Software Engineering | Software Engineer | Married | 102440.0 |
| 192 | Software Engineering | Software Engineer | Single | 115460.8 |
| 193 | Software Engineering | Software Engineer | Single | 116480.0 |
| 194 | Software Engineering | Software Engineer | Single | 118809.6 |
# Создадим сетку графиков зависимости зарплат от от семейного положения сотрудников в разрезе должностей и департаментов
g=sns.catplot(
kind="strip",
x="position",
y="USD per Year",
hue="maritaldesc",
data=dfg_salary_over_marital,
col="department",
col_wrap=3,
height=4.5,
aspect = 0.8,
palette="tab10",
margin_titles=True,
sharex=False,
sharey=False,
marker='D',
s=10,
jitter=True,
alpha=0.5)
g.fig.suptitle("Зависимость годовой заработной платы от семейного положения сотрудников для сопостовимых должностей",
fontsize=16, x=0.50, y=1.03)
# Определим размер подписи оси X
g.set_xlabels(fontsize=10)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=10)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=90,fontsize=9)
# Изменим интервал между индивидуальными графиками
g.fig.subplots_adjust(hspace=0.7)
plt.show()
ВЫВОД
Можно говорить о некоторой зависимости между семейным положением и уровнем ставки заработной платы. Эта зависимость специфична для каждой отдельной должности.
Примечательно, что чаще всего наибольшие заработные платы получают холостые (незамужние) сотрудники.
sql_quiery = \
"""
-- Создадим запрос и датафрейм с признаком национальной принадлежности и зарплатными ставками по должностям,
-- на которых работают несколько сотрудников
-- Используем его для построения сетки графиков
-- Для этого создадим подзапрос с номерами сотрудников, расово-этнической принадлежностью и ставками
WITH
EmployeesNationAndPayrate AS
(SELECT
"Employee Number",
racedesc,
ROUND("Pay Rate"::numeric * 2080, 2) AS "USD per Year"
FROM
hr_dataset
)
SELECT
department,
position,
racedesc,
"USD per Year"
FROM
PositionsWithMultActiveEmployees
LEFT JOIN
EmployeesNationAndPayrate
USING ("Employee Number")
ORDER BY
department,
position,
"USD per Year",
racedesc
;
"""
dfg_salary_over_nation = pd.read_sql(sql_quiery, conn)
dfg_salary_over_nation
| department | position | racedesc | USD per Year | |
|---|---|---|---|---|
| 0 | Admin Offices | Accountant I | Black or African American | 47840.0 |
| 1 | Admin Offices | Accountant I | Black or African American | 59280.0 |
| 2 | Admin Offices | Accountant I | White | 60320.0 |
| 3 | Admin Offices | Administrative Assistant | White | 34444.8 |
| 4 | Admin Offices | Administrative Assistant | White | 44720.0 |
| 5 | Admin Offices | Sr. Accountant | Asian | 72696.0 |
| 6 | Admin Offices | Sr. Accountant | White | 72696.0 |
| 7 | IT/IS | BI Developer | Black or African American | 93600.0 |
| 8 | IT/IS | BI Developer | Black or African American | 93600.0 |
| 9 | IT/IS | BI Developer | White | 93600.0 |
| 10 | IT/IS | BI Developer | White | 95680.0 |
| 11 | IT/IS | Database Administrator | White | 62816.0 |
| 12 | IT/IS | Database Administrator | White | 65312.0 |
| 13 | IT/IS | Database Administrator | White | 70720.0 |
| 14 | IT/IS | Database Administrator | Black or African American | 73840.0 |
| 15 | IT/IS | Database Administrator | White | 82264.0 |
| 16 | IT/IS | Database Administrator | Asian | 83200.0 |
| 17 | IT/IS | Database Administrator | White | 87776.0 |
| 18 | IT/IS | Database Administrator | Asian | 88920.0 |
| 19 | IT/IS | IT Support | Two or more races | 54080.0 |
| 20 | IT/IS | IT Support | Black or African American | 57179.2 |
| 21 | IT/IS | IT Support | White | 60299.2 |
| 22 | IT/IS | IT Support | White | 65312.0 |
| 23 | IT/IS | Network Engineer | White | 56160.0 |
| 24 | IT/IS | Network Engineer | White | 76960.0 |
| 25 | IT/IS | Network Engineer | White | 81120.0 |
| 26 | IT/IS | Network Engineer | White | 87360.0 |
| 27 | IT/IS | Network Engineer | White | 89440.0 |
| 28 | IT/IS | Network Engineer | White | 93600.0 |
| 29 | IT/IS | Network Engineer | White | 97760.0 |
| 30 | IT/IS | Network Engineer | White | 102128.0 |
| 31 | IT/IS | Senior BI Developer | Asian | 104520.0 |
| 32 | IT/IS | Senior BI Developer | Asian | 106080.0 |
| 33 | IT/IS | Senior BI Developer | Asian | 108680.0 |
| 34 | IT/IS | Sr. Network Engineer | White | 110240.0 |
| 35 | IT/IS | Sr. Network Engineer | Asian | 111904.0 |
| 36 | IT/IS | Sr. Network Engineer | White | 112528.0 |
| 37 | IT/IS | Sr. Network Engineer | Asian | 114816.0 |
| 38 | IT/IS | Sr. Network Engineer | White | 116896.0 |
| 39 | Production | Production Manager | White | 106080.0 |
| 40 | Production | Production Manager | White | 108160.0 |
| 41 | Production | Production Manager | Hispanic | 110240.0 |
| 42 | Production | Production Manager | White | 110240.0 |
| 43 | Production | Production Manager | White | 112320.0 |
| 44 | Production | Production Manager | White | 113360.0 |
| 45 | Production | Production Manager | Black or African American | 114400.0 |
| 46 | Production | Production Manager | White | 114400.0 |
| 47 | Production | Production Manager | White | 114400.0 |
| 48 | Production | Production Technician I | Asian | 29120.0 |
| 49 | Production | Production Technician I | Black or African American | 29120.0 |
| 50 | Production | Production Technician I | Two or more races | 29120.0 |
| 51 | Production | Production Technician I | Asian | 31200.0 |
| 52 | Production | Production Technician I | Asian | 31200.0 |
| 53 | Production | Production Technician I | White | 31200.0 |
| 54 | Production | Production Technician I | White | 31200.0 |
| 55 | Production | Production Technician I | White | 31200.0 |
| 56 | Production | Production Technician I | White | 31200.0 |
| 57 | Production | Production Technician I | White | 31200.0 |
| 58 | Production | Production Technician I | Asian | 31616.0 |
| 59 | Production | Production Technician I | White | 32760.0 |
| 60 | Production | Production Technician I | American Indian or Alaska Native | 33280.0 |
| 61 | Production | Production Technician I | Two or more races | 33280.0 |
| 62 | Production | Production Technician I | White | 33280.0 |
| 63 | Production | Production Technician I | White | 33280.0 |
| 64 | Production | Production Technician I | White | 33280.0 |
| 65 | Production | Production Technician I | White | 33280.0 |
| 66 | Production | Production Technician I | White | 33280.0 |
| 67 | Production | Production Technician I | White | 33280.0 |
| 68 | Production | Production Technician I | White | 33280.0 |
| 69 | Production | Production Technician I | White | 34840.0 |
| 70 | Production | Production Technician I | Two or more races | 34860.8 |
| 71 | Production | Production Technician I | Asian | 35360.0 |
| 72 | Production | Production Technician I | Black or African American | 35360.0 |
| 73 | Production | Production Technician I | White | 35360.0 |
| 74 | Production | Production Technician I | White | 35360.0 |
| 75 | Production | Production Technician I | White | 35360.0 |
| 76 | Production | Production Technician I | White | 35360.0 |
| 77 | Production | Production Technician I | White | 35360.0 |
| 78 | Production | Production Technician I | White | 35360.0 |
| 79 | Production | Production Technician I | Asian | 37440.0 |
| 80 | Production | Production Technician I | White | 37440.0 |
| 81 | Production | Production Technician I | Black or African American | 39520.0 |
| 82 | Production | Production Technician I | White | 39520.0 |
| 83 | Production | Production Technician I | White | 39520.0 |
| 84 | Production | Production Technician I | White | 39520.0 |
| 85 | Production | Production Technician I | White | 39520.0 |
| 86 | Production | Production Technician I | White | 39520.0 |
| 87 | Production | Production Technician I | White | 39520.0 |
| 88 | Production | Production Technician I | White | 40560.0 |
| 89 | Production | Production Technician I | White | 41080.0 |
| 90 | Production | Production Technician I | Black or African American | 41600.0 |
| 91 | Production | Production Technician I | Black or African American | 41600.0 |
| 92 | Production | Production Technician I | Black or African American | 41600.0 |
| 93 | Production | Production Technician I | Black or African American | 41600.0 |
| 94 | Production | Production Technician I | White | 41600.0 |
| 95 | Production | Production Technician I | White | 41600.0 |
| 96 | Production | Production Technician I | White | 41600.0 |
| 97 | Production | Production Technician I | White | 41600.0 |
| 98 | Production | Production Technician I | White | 41600.0 |
| 99 | Production | Production Technician I | White | 41600.0 |
| 100 | Production | Production Technician I | White | 41600.0 |
| 101 | Production | Production Technician I | Asian | 43680.0 |
| 102 | Production | Production Technician I | Asian | 43680.0 |
| 103 | Production | Production Technician I | Black or African American | 43680.0 |
| 104 | Production | Production Technician I | White | 43680.0 |
| 105 | Production | Production Technician I | White | 43680.0 |
| 106 | Production | Production Technician I | White | 43680.0 |
| 107 | Production | Production Technician I | White | 43680.0 |
| 108 | Production | Production Technician I | White | 43680.0 |
| 109 | Production | Production Technician I | White | 44200.0 |
| 110 | Production | Production Technician I | Asian | 45760.0 |
| 111 | Production | Production Technician I | Asian | 45760.0 |
| 112 | Production | Production Technician I | Asian | 45760.0 |
| 113 | Production | Production Technician I | Black or African American | 45760.0 |
| 114 | Production | Production Technician I | Black or African American | 45760.0 |
| 115 | Production | Production Technician I | Black or African American | 45760.0 |
| 116 | Production | Production Technician I | Two or more races | 45760.0 |
| 117 | Production | Production Technician I | White | 45760.0 |
| 118 | Production | Production Technician I | White | 45760.0 |
| 119 | Production | Production Technician I | White | 45760.0 |
| 120 | Production | Production Technician I | White | 45760.0 |
| 121 | Production | Production Technician I | White | 47840.0 |
| 122 | Production | Production Technician I | White | 47840.0 |
| 123 | Production | Production Technician I | White | 47840.0 |
| 124 | Production | Production Technician I | Asian | 49920.0 |
| 125 | Production | Production Technician I | Black or African American | 49920.0 |
| 126 | Production | Production Technician I | Black or African American | 49920.0 |
| 127 | Production | Production Technician I | Black or African American | 49920.0 |
| 128 | Production | Production Technician I | White | 49920.0 |
| 129 | Production | Production Technician I | White | 49920.0 |
| 130 | Production | Production Technician I | White | 50960.0 |
| 131 | Production | Production Technician I | White | 51480.0 |
| 132 | Production | Production Technician II | Black or African American | 45760.0 |
| 133 | Production | Production Technician II | White | 45760.0 |
| 134 | Production | Production Technician II | White | 45760.0 |
| 135 | Production | Production Technician II | White | 45760.0 |
| 136 | Production | Production Technician II | White | 45760.0 |
| 137 | Production | Production Technician II | American Indian or Alaska Native | 46800.0 |
| 138 | Production | Production Technician II | White | 47840.0 |
| 139 | Production | Production Technician II | Black or African American | 49920.0 |
| 140 | Production | Production Technician II | White | 49920.0 |
| 141 | Production | Production Technician II | White | 49920.0 |
| 142 | Production | Production Technician II | White | 50440.0 |
| 143 | Production | Production Technician II | White | 50440.0 |
| 144 | Production | Production Technician II | Black or African American | 52000.0 |
| 145 | Production | Production Technician II | Black or African American | 52000.0 |
| 146 | Production | Production Technician II | Two or more races | 52000.0 |
| 147 | Production | Production Technician II | White | 52000.0 |
| 148 | Production | Production Technician II | White | 52000.0 |
| 149 | Production | Production Technician II | Asian | 54080.0 |
| 150 | Production | Production Technician II | Hispanic | 54080.0 |
| 151 | Production | Production Technician II | White | 54080.0 |
| 152 | Production | Production Technician II | White | 54080.0 |
| 153 | Production | Production Technician II | Black or African American | 54288.0 |
| 154 | Production | Production Technician II | White | 54891.2 |
| 155 | Production | Production Technician II | American Indian or Alaska Native | 56160.0 |
| 156 | Production | Production Technician II | Black or African American | 56160.0 |
| 157 | Production | Production Technician II | White | 56160.0 |
| 158 | Production | Production Technician II | White | 56160.0 |
| 159 | Production | Production Technician II | White | 56160.0 |
| 160 | Production | Production Technician II | White | 58240.0 |
| 161 | Production | Production Technician II | Black or African American | 60320.0 |
| 162 | Production | Production Technician II | White | 60320.0 |
| 163 | Sales | Area Sales Manager | Black or African American | 112320.0 |
| 164 | Sales | Area Sales Manager | Black or African American | 114400.0 |
| 165 | Sales | Area Sales Manager | Black or African American | 114400.0 |
| 166 | Sales | Area Sales Manager | Black or African American | 114400.0 |
| 167 | Sales | Area Sales Manager | Black or African American | 114400.0 |
| 168 | Sales | Area Sales Manager | Two or more races | 114400.0 |
| 169 | Sales | Area Sales Manager | Two or more races | 114400.0 |
| 170 | Sales | Area Sales Manager | Two or more races | 114400.0 |
| 171 | Sales | Area Sales Manager | Two or more races | 114400.0 |
| 172 | Sales | Area Sales Manager | White | 114400.0 |
| 173 | Sales | Area Sales Manager | White | 114400.0 |
| 174 | Sales | Area Sales Manager | White | 114400.0 |
| 175 | Sales | Area Sales Manager | White | 114400.0 |
| 176 | Sales | Area Sales Manager | White | 114400.0 |
| 177 | Sales | Area Sales Manager | White | 114400.0 |
| 178 | Sales | Area Sales Manager | White | 114400.0 |
| 179 | Sales | Area Sales Manager | White | 114400.0 |
| 180 | Sales | Area Sales Manager | Two or more races | 115440.0 |
| 181 | Sales | Area Sales Manager | White | 115440.0 |
| 182 | Sales | Area Sales Manager | American Indian or Alaska Native | 116480.0 |
| 183 | Sales | Area Sales Manager | Asian | 116480.0 |
| 184 | Sales | Area Sales Manager | Black or African American | 116480.0 |
| 185 | Sales | Area Sales Manager | White | 116480.0 |
| 186 | Sales | Area Sales Manager | White | 118560.0 |
| 187 | Sales | Sales Manager | White | 112320.0 |
| 188 | Sales | Sales Manager | Black or African American | 116480.0 |
| 189 | Software Engineering | Software Engineer | White | 99008.0 |
| 190 | Software Engineering | Software Engineer | White | 99840.0 |
| 191 | Software Engineering | Software Engineer | White | 102440.0 |
| 192 | Software Engineering | Software Engineer | Asian | 115460.8 |
| 193 | Software Engineering | Software Engineer | White | 116480.0 |
| 194 | Software Engineering | Software Engineer | Black or African American | 118809.6 |
# Создадим сетку графиков зависимости зарплат от расово-этнической принадлежности
# сотрудников в разрезе должностей и департаментов
g=sns.catplot(
kind="strip",
x="position",
y="USD per Year",
hue="racedesc",
data=dfg_salary_over_nation,
col="department",
col_wrap=3,
height=4.5,
aspect = 0.8,
palette="Set1",
margin_titles=True,
sharex=False,
sharey=False,
marker='D',
s=10,
jitter=True,
alpha=0.75)
g.fig.suptitle(
"Зависимость годовой заработной платы от расово-этнической принадлжености сотрудников для сопостовимых должностей",
fontsize=16, x=0.50, y=1.03)
# Определим размер подписи оси X
g.set_xlabels(fontsize=10)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=10)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=90,fontsize=9)
# Изменим интервал между индивидуальными графиками
g.fig.subplots_adjust(hspace=0.7)
plt.show()
ВЫВОД
В целом, наибольшие зарплаты получают белые сотрудники. Наименьшие - (с несколькими исключениями) афроамериканцы и сотрудники смешанной расы.
(для пересчёта дней в годы: 1 год = 360 дней)
# Создадим таблицу для анализа расределения сроков работы в компании по статутсам занятости.
sql_quiery = \
"""
WITH
status_term_schedule AS
(SELECT
"Employment Status",
ROUND("Days Employed"::numeric / 360, 1) AS "Years Employed",
COUNT("Employee Number") AS empl_count
FROM
hr_dataset
GROUP BY
"Employment Status",
"Years Employed"
ORDER BY
"Employment Status",
"Years Employed"
),
status_median_term AS
(SELECT
"Employment Status",
PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY "Years Employed") AS median_term
FROM
status_term_schedule
GROUP BY
"Employment Status"
)
SELECT
"Employment Status",
empl_count,
median_term,
"Years Employed"
FROM
status_term_schedule
LEFT JOIN
status_median_term
USING("Employment Status")
;
"""
dfg_term_over_status = pd.read_sql(sql_quiery, conn)
dfg_term_over_status
| Employment Status | empl_count | median_term | Years Employed | |
|---|---|---|---|---|
| 0 | Active | 1 | 5.05 | 0.2 |
| 1 | Active | 2 | 5.05 | 0.6 |
| 2 | Active | 3 | 5.05 | 0.8 |
| 3 | Active | 1 | 5.05 | 0.9 |
| 4 | Active | 3 | 5.05 | 1.2 |
| 5 | Active | 2 | 5.05 | 1.9 |
| 6 | Active | 1 | 5.05 | 2.5 |
| 7 | Active | 2 | 5.05 | 2.6 |
| 8 | Active | 11 | 5.05 | 2.7 |
| 9 | Active | 4 | 5.05 | 2.8 |
| 10 | Active | 7 | 5.05 | 2.9 |
| 11 | Active | 6 | 5.05 | 3.1 |
| 12 | Active | 13 | 5.05 | 3.2 |
| 13 | Active | 3 | 5.05 | 3.3 |
| 14 | Active | 7 | 5.05 | 3.4 |
| 15 | Active | 8 | 5.05 | 3.6 |
| 16 | Active | 2 | 5.05 | 3.7 |
| 17 | Active | 5 | 5.05 | 3.8 |
| 18 | Active | 6 | 5.05 | 3.9 |
| 19 | Active | 6 | 5.05 | 4.1 |
| 20 | Active | 6 | 5.05 | 4.2 |
| 21 | Active | 5 | 5.05 | 4.3 |
| 22 | Active | 6 | 5.05 | 4.5 |
| 23 | Active | 2 | 5.05 | 4.6 |
| 24 | Active | 2 | 5.05 | 4.7 |
| 25 | Active | 1 | 5.05 | 4.8 |
| 26 | Active | 1 | 5.05 | 4.9 |
| 27 | Active | 2 | 5.05 | 5.0 |
| 28 | Active | 2 | 5.05 | 5.1 |
| 29 | Active | 1 | 5.05 | 5.2 |
| 30 | Active | 1 | 5.05 | 5.3 |
| 31 | Active | 2 | 5.05 | 5.4 |
| 32 | Active | 4 | 5.05 | 5.5 |
| 33 | Active | 3 | 5.05 | 5.6 |
| 34 | Active | 4 | 5.05 | 5.7 |
| 35 | Active | 2 | 5.05 | 5.8 |
| 36 | Active | 5 | 5.05 | 5.9 |
| 37 | Active | 4 | 5.05 | 6.0 |
| 38 | Active | 4 | 5.05 | 6.1 |
| 39 | Active | 1 | 5.05 | 6.2 |
| 40 | Active | 1 | 5.05 | 6.3 |
| 41 | Active | 2 | 5.05 | 6.4 |
| 42 | Active | 4 | 5.05 | 6.5 |
| 43 | Active | 3 | 5.05 | 6.6 |
| 44 | Active | 4 | 5.05 | 6.7 |
| 45 | Active | 1 | 5.05 | 6.8 |
| 46 | Active | 1 | 5.05 | 6.9 |
| 47 | Active | 3 | 5.05 | 7.0 |
| 48 | Active | 2 | 5.05 | 7.3 |
| 49 | Active | 1 | 5.05 | 7.5 |
| 50 | Active | 3 | 5.05 | 7.7 |
| 51 | Active | 3 | 5.05 | 9.0 |
| 52 | Active | 1 | 5.05 | 9.2 |
| 53 | Active | 1 | 5.05 | 10.0 |
| 54 | Active | 1 | 5.05 | 10.2 |
| 55 | Active | 1 | 5.05 | 12.1 |
| 56 | Future Start | 7 | 1.60 | 1.4 |
| 57 | Future Start | 1 | 1.60 | 1.5 |
| 58 | Future Start | 1 | 1.60 | 1.6 |
| 59 | Future Start | 1 | 1.60 | 2.4 |
| 60 | Future Start | 1 | 1.60 | 2.5 |
| 61 | Leave of Absence | 1 | 4.15 | 2.8 |
| 62 | Leave of Absence | 1 | 4.15 | 2.9 |
| 63 | Leave of Absence | 2 | 4.15 | 3.1 |
| 64 | Leave of Absence | 2 | 4.15 | 3.6 |
| 65 | Leave of Absence | 1 | 4.15 | 4.1 |
| 66 | Leave of Absence | 2 | 4.15 | 4.2 |
| 67 | Leave of Absence | 1 | 4.15 | 4.3 |
| 68 | Leave of Absence | 2 | 4.15 | 4.5 |
| 69 | Leave of Absence | 1 | 4.15 | 6.0 |
| 70 | Leave of Absence | 1 | 4.15 | 8.5 |
| 71 | Terminated for Cause | 2 | 2.05 | 0.0 |
| 72 | Terminated for Cause | 1 | 2.05 | 0.1 |
| 73 | Terminated for Cause | 1 | 2.05 | 0.5 |
| 74 | Terminated for Cause | 3 | 2.05 | 1.2 |
| 75 | Terminated for Cause | 1 | 2.05 | 2.0 |
| 76 | Terminated for Cause | 2 | 2.05 | 2.1 |
| 77 | Terminated for Cause | 1 | 2.05 | 2.5 |
| 78 | Terminated for Cause | 1 | 2.05 | 4.4 |
| 79 | Terminated for Cause | 1 | 2.05 | 5.0 |
| 80 | Terminated for Cause | 1 | 2.05 | 5.4 |
| 81 | Voluntarily Terminated | 3 | 2.20 | 0.0 |
| 82 | Voluntarily Terminated | 3 | 2.20 | 0.1 |
| 83 | Voluntarily Terminated | 5 | 2.20 | 0.2 |
| 84 | Voluntarily Terminated | 4 | 2.20 | 0.3 |
| 85 | Voluntarily Terminated | 1 | 2.20 | 0.4 |
| 86 | Voluntarily Terminated | 3 | 2.20 | 0.5 |
| 87 | Voluntarily Terminated | 1 | 2.20 | 0.6 |
| 88 | Voluntarily Terminated | 3 | 2.20 | 0.7 |
| 89 | Voluntarily Terminated | 2 | 2.20 | 0.8 |
| 90 | Voluntarily Terminated | 2 | 2.20 | 0.9 |
| 91 | Voluntarily Terminated | 3 | 2.20 | 1.1 |
| 92 | Voluntarily Terminated | 6 | 2.20 | 1.2 |
| 93 | Voluntarily Terminated | 1 | 2.20 | 1.3 |
| 94 | Voluntarily Terminated | 3 | 2.20 | 1.4 |
| 95 | Voluntarily Terminated | 1 | 2.20 | 1.5 |
| 96 | Voluntarily Terminated | 2 | 2.20 | 1.6 |
| 97 | Voluntarily Terminated | 1 | 2.20 | 1.7 |
| 98 | Voluntarily Terminated | 1 | 2.20 | 1.9 |
| 99 | Voluntarily Terminated | 2 | 2.20 | 2.0 |
| 100 | Voluntarily Terminated | 3 | 2.20 | 2.1 |
| 101 | Voluntarily Terminated | 2 | 2.20 | 2.2 |
| 102 | Voluntarily Terminated | 1 | 2.20 | 2.5 |
| 103 | Voluntarily Terminated | 2 | 2.20 | 2.6 |
| 104 | Voluntarily Terminated | 1 | 2.20 | 2.9 |
| 105 | Voluntarily Terminated | 2 | 2.20 | 3.0 |
| 106 | Voluntarily Terminated | 2 | 2.20 | 3.1 |
| 107 | Voluntarily Terminated | 3 | 2.20 | 3.2 |
| 108 | Voluntarily Terminated | 2 | 2.20 | 3.3 |
| 109 | Voluntarily Terminated | 2 | 2.20 | 3.5 |
| 110 | Voluntarily Terminated | 3 | 2.20 | 3.7 |
| 111 | Voluntarily Terminated | 2 | 2.20 | 3.9 |
| 112 | Voluntarily Terminated | 2 | 2.20 | 4.0 |
| 113 | Voluntarily Terminated | 2 | 2.20 | 4.1 |
| 114 | Voluntarily Terminated | 1 | 2.20 | 4.4 |
| 115 | Voluntarily Terminated | 4 | 2.20 | 4.5 |
| 116 | Voluntarily Terminated | 2 | 2.20 | 4.7 |
| 117 | Voluntarily Terminated | 1 | 2.20 | 5.1 |
| 118 | Voluntarily Terminated | 1 | 2.20 | 5.3 |
| 119 | Voluntarily Terminated | 1 | 2.20 | 5.5 |
| 120 | Voluntarily Terminated | 1 | 2.20 | 5.6 |
| 121 | Voluntarily Terminated | 1 | 2.20 | 7.2 |
# Создадим график распределения сотрудников по сроку работы в компании в годах c разбивкой по статусам занятости
g=sns.displot(data=dfg_term_over_status,
x="Years Employed",
hue="Employment Status",
hue_order=["Active","Leave of Absence", "Future Start", "Voluntarily Terminated", "Terminated for Cause"],
col="Employment Status",
col_order=["Active","Leave of Absence", "Future Start", "Voluntarily Terminated", "Terminated for Cause"],
col_wrap=2,
multiple="layer",
element="step",
binwidth=0.25,
height=4,
aspect=1.75,
palette="Dark2",
facet_kws=dict(sharex=True,
sharey=True)
)
# Определяем шкалу X:
scale_step = 1 # Определяем шаг шкалы X - его можно менять!
# Определяем основной диапазон шкалы
scale_span = list(range((int(dfg_term_over_status['Years Employed'].min() / scale_step) * scale_step),
int(dfg_term_over_status['Years Employed'].max()),
scale_step))
# При неоходимости дополняем первый и послдений элементы к шкале (если они не вошли в шкалу ранее)
if dfg_term_over_status['Years Employed'].min() < scale_span[0]:
scale_span = ([dfg_term_over_status['Years Employed'].min()] + scale_span)
if dfg_term_over_status['Years Employed'].max() > scale_span[-1]:
scale_span = scale_span + [dfg_term_over_status['Years Employed'].max()]
plt.xticks(ticks=scale_span, fontsize=14, rotation=0) # установим шкалу X и размер её обозначений
plt.yticks(ticks=[0,1,2,3],fontsize=14) # Установим шаг и размер обозначения для шкалы Y
g.set_xlabels(fontsize=16) # Размер подписей шкалы X
g.set_ylabels(fontsize=16) # Размер подписей шкалы Y
# Заголовок графика:
g.fig.suptitle("Распеределение сроков работы в компании по статусам занятости", fontsize=20, y=1.0125)
# Добавим вертикальные линии медианы и средней арифметической срока работы в компании
# Для этого определим функцию построения вертикальлной линии и её подписи
def years_employed_lines(x, **kwargs):
# Построение линии медианы на основе принимаемых значений x, цвет, толщина, стиль
plt.axvline(x.unique()[0], color='navy', linewidth=2, linestyle=':')
# Создание подписи к линии медианы
plt.annotate(
text=f"медиана {x.unique()[0]}", # Аннотация линии медианы.
xy=(x.unique()[0]-0.15, 2), # Положение подписи в единицах шкал графика
horizontalalignment='center', # Выравнивание текста по горизонтали
verticalalignment='top', # Выравнивание текста по вертикали
rotation="vertical", # Поворот подписи
color='navy', # Цвет надписи
alpha=1,
fontsize=12
)
# Определяем "разметку" для исполнения функции построения линий
# Передаём функции величины возрастов
g.map(years_employed_lines, 'median_term')
plt.show()
ВЫВОД
# Создадим таблицу для исследования зависимостей продолжительности работы в компаниии и занимаемой должности
sql_quiery = \
"""
SELECT
COALESCE(department, '[TOTAL]') as department,
COALESCE(position, '[SUBTOTAL department]') AS position,
COUNT("Employee Number"),
ROUND(MIN("Days Employed")::numeric / 360, 2) AS min_years_empld,
ROUND(PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY "Days Employed")::numeric / 360, 2) AS median_years_empld,
ROUND(MAX("Days Employed")::numeric / 360, 2) AS max_years_empld,
ROUND(AVG("Days Employed")::numeric / 360, 2) AS avg_years_empld
FROM
hr_dataset
WHERE -- Условие, что работники действующие
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
GROUP BY ROLLUP(
department,
position
)
;
"""
df_term_over_dptmnt_pstn = pd.read_sql(sql_quiery, conn, index_col=['department', 'position'])
df_term_over_dptmnt_pstn
| count | min_years_empld | median_years_empld | max_years_empld | avg_years_empld | ||
|---|---|---|---|---|---|---|
| department | position | |||||
| Admin Offices | Accountant I | 3 | 3.21 | 3.94 | 9.21 | 5.45 |
| Administrative Assistant | 2 | 0.16 | 1.39 | 2.61 | 1.39 | |
| Shared Services Manager | 1 | 1.92 | 1.92 | 1.92 | 1.92 | |
| Sr. Accountant | 2 | 2.82 | 5.92 | 9.02 | 5.92 | |
| [SUBTOTAL department] | 8 | 0.16 | 3.01 | 9.21 | 4.11 | |
| Executive Office | President & CEO | 1 | 5.48 | 5.48 | 5.48 | 5.48 |
| [SUBTOTAL department] | 1 | 5.48 | 5.48 | 5.48 | 5.48 | |
| IT/IS | BI Developer | 4 | 0.61 | 0.70 | 1.17 | 0.79 |
| BI Director | 1 | 1.24 | 1.24 | 1.24 | 1.24 | |
| CIO | 1 | 7.74 | 7.74 | 7.74 | 7.74 | |
| Data Architect | 1 | 0.90 | 0.90 | 0.90 | 0.90 | |
| Database Administrator | 8 | 2.70 | 2.88 | 3.09 | 2.87 | |
| IT Director | 1 | 6.71 | 6.71 | 6.71 | 6.71 | |
| IT Manager - DB | 1 | 4.92 | 4.92 | 4.92 | 4.92 | |
| IT Manager - Infra | 1 | 5.86 | 5.86 | 5.86 | 5.86 | |
| IT Manager - Support | 1 | 3.95 | 3.95 | 3.95 | 3.95 | |
| IT Support | 4 | 5.30 | 6.75 | 7.68 | 6.62 | |
| Network Engineer | 8 | 2.70 | 2.88 | 3.20 | 2.86 | |
| Senior BI Developer | 3 | 0.79 | 0.80 | 1.17 | 0.92 | |
| Sr. DBA | 1 | 1.43 | 1.43 | 1.43 | 1.43 | |
| Sr. Network Engineer | 5 | 1.43 | 3.09 | 3.09 | 2.68 | |
| [SUBTOTAL department] | 40 | 0.61 | 2.88 | 7.74 | 3.11 | |
| Production | Director of Operations | 1 | 9.02 | 9.02 | 9.02 | 9.02 |
| Production Manager | 9 | 1.86 | 5.23 | 9.01 | 5.03 | |
| Production Technician I | 84 | 1.41 | 4.33 | 10.21 | 4.62 | |
| Production Technician II | 31 | 1.37 | 3.83 | 7.35 | 4.20 | |
| [SUBTOTAL department] | 125 | 1.37 | 4.22 | 10.21 | 4.58 | |
| Sales | Area Sales Manager | 24 | 1.41 | 4.39 | 12.05 | 4.93 |
| Director of Sales | 1 | 3.61 | 3.61 | 3.61 | 3.61 | |
| Sales Manager | 2 | 3.58 | 3.60 | 3.61 | 3.60 | |
| [SUBTOTAL department] | 27 | 1.41 | 4.22 | 12.05 | 4.79 | |
| Software Engineering | Software Engineer | 6 | 3.09 | 4.10 | 5.97 | 4.30 |
| Software Engineering Manager | 1 | 6.38 | 6.38 | 6.38 | 6.38 | |
| [SUBTOTAL department] | 7 | 3.09 | 4.10 | 6.38 | 4.60 | |
| [TOTAL] | [SUBTOTAL department] | 208 | 0.16 | 4.02 | 12.05 | 4.31 |
ВЫВОД
В целом по компании сроки работы в компании колеблются от почти 2 месяцев до более 12 лет. Половина сотрудников работает не более 4 лет (в среднем 4 года и 4 месяца).
# Создадим таблицу для исследования зависимостей продолжительности работы в компаниии и источника найма
sql_quiery = \
"""
WITH
term_source AS
(SELECT
"Employee Source",
ROUND("Days Employed"::numeric / 360, 1) AS "Years Employed",
COUNT("Employee Number") AS empl_count
FROM
hr_dataset
GROUP BY
"Employee Source",
"Days Employed"
ORDER BY
"Employee Source",
"Days Employed"
),
term_source_median AS
(SELECT
"Employee Source",
ROUND(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY "Years Employed")::numeric, 1) AS "term_median"
FROM
term_source
GROUP BY
"Employee Source"
)
SELECT
*
FROM
term_source
LEFT JOIN
term_source_median
USING ("Employee Source")
;
"""
dfg_term_over_source = pd.read_sql(sql_quiery, conn)
dfg_term_over_source
| Employee Source | Years Employed | empl_count | term_median | |
|---|---|---|---|---|
| 0 | Billboard | 0.2 | 1 | 4.3 |
| 1 | Billboard | 0.7 | 1 | 4.3 |
| 2 | Billboard | 0.9 | 1 | 4.3 |
| 3 | Billboard | 2.1 | 1 | 4.3 |
| 4 | Billboard | 2.9 | 1 | 4.3 |
| 5 | Billboard | 3.3 | 1 | 4.3 |
| 6 | Billboard | 4.1 | 1 | 4.3 |
| 7 | Billboard | 4.2 | 1 | 4.3 |
| 8 | Billboard | 4.3 | 1 | 4.3 |
| 9 | Billboard | 4.6 | 1 | 4.3 |
| 10 | Billboard | 4.8 | 1 | 4.3 |
| 11 | Billboard | 5.5 | 1 | 4.3 |
| 12 | Billboard | 5.9 | 1 | 4.3 |
| 13 | Billboard | 9.0 | 1 | 4.3 |
| 14 | Billboard | 10.0 | 1 | 4.3 |
| 15 | Billboard | 12.1 | 1 | 4.3 |
| 16 | Careerbuilder | 6.7 | 1 | 6.7 |
| 17 | Company Intranet - Partner | 1.2 | 1 | 1.2 |
| 18 | Diversity Job Fair | 0.0 | 1 | 3.7 |
| 19 | Diversity Job Fair | 0.0 | 1 | 3.7 |
| 20 | Diversity Job Fair | 0.3 | 1 | 3.7 |
| 21 | Diversity Job Fair | 0.5 | 1 | 3.7 |
| 22 | Diversity Job Fair | 1.1 | 1 | 3.7 |
| 23 | Diversity Job Fair | 1.2 | 1 | 3.7 |
| 24 | Diversity Job Fair | 1.4 | 1 | 3.7 |
| 25 | Diversity Job Fair | 2.0 | 1 | 3.7 |
| 26 | Diversity Job Fair | 2.6 | 1 | 3.7 |
| 27 | Diversity Job Fair | 2.8 | 1 | 3.7 |
| 28 | Diversity Job Fair | 3.1 | 1 | 3.7 |
| 29 | Diversity Job Fair | 3.3 | 1 | 3.7 |
| 30 | Diversity Job Fair | 3.5 | 1 | 3.7 |
| 31 | Diversity Job Fair | 3.6 | 1 | 3.7 |
| 32 | Diversity Job Fair | 3.7 | 1 | 3.7 |
| 33 | Diversity Job Fair | 3.9 | 1 | 3.7 |
| 34 | Diversity Job Fair | 3.9 | 1 | 3.7 |
| 35 | Diversity Job Fair | 4.0 | 1 | 3.7 |
| 36 | Diversity Job Fair | 4.1 | 1 | 3.7 |
| 37 | Diversity Job Fair | 4.1 | 1 | 3.7 |
| 38 | Diversity Job Fair | 4.5 | 1 | 3.7 |
| 39 | Diversity Job Fair | 4.7 | 1 | 3.7 |
| 40 | Diversity Job Fair | 5.9 | 1 | 3.7 |
| 41 | Diversity Job Fair | 6.0 | 1 | 3.7 |
| 42 | Diversity Job Fair | 6.5 | 1 | 3.7 |
| 43 | Diversity Job Fair | 6.8 | 1 | 3.7 |
| 44 | Diversity Job Fair | 6.9 | 1 | 3.7 |
| 45 | Diversity Job Fair | 8.5 | 1 | 3.7 |
| 46 | Diversity Job Fair | 9.2 | 1 | 3.7 |
| 47 | Employee Referral | 0.0 | 1 | 3.3 |
| 48 | Employee Referral | 1.4 | 1 | 3.3 |
| 49 | Employee Referral | 1.4 | 1 | 3.3 |
| 50 | Employee Referral | 1.4 | 1 | 3.3 |
| 51 | Employee Referral | 2.5 | 1 | 3.3 |
| 52 | Employee Referral | 2.7 | 4 | 3.3 |
| 53 | Employee Referral | 2.8 | 1 | 3.3 |
| 54 | Employee Referral | 2.9 | 3 | 3.3 |
| 55 | Employee Referral | 3.1 | 3 | 3.3 |
| 56 | Employee Referral | 3.2 | 1 | 3.3 |
| 57 | Employee Referral | 3.2 | 1 | 3.3 |
| 58 | Employee Referral | 3.3 | 1 | 3.3 |
| 59 | Employee Referral | 3.7 | 1 | 3.3 |
| 60 | Employee Referral | 3.9 | 1 | 3.3 |
| 61 | Employee Referral | 4.1 | 1 | 3.3 |
| 62 | Employee Referral | 4.2 | 2 | 3.3 |
| 63 | Employee Referral | 4.5 | 2 | 3.3 |
| 64 | Employee Referral | 5.0 | 1 | 3.3 |
| 65 | Employee Referral | 5.6 | 1 | 3.3 |
| 66 | Employee Referral | 5.8 | 1 | 3.3 |
| 67 | Employee Referral | 6.4 | 1 | 3.3 |
| 68 | Employee Referral | 7.7 | 1 | 3.3 |
| 69 | Glassdoor | 0.1 | 1 | 3.1 |
| 70 | Glassdoor | 0.6 | 1 | 3.1 |
| 71 | Glassdoor | 0.7 | 1 | 3.1 |
| 72 | Glassdoor | 1.2 | 1 | 3.1 |
| 73 | Glassdoor | 2.7 | 1 | 3.1 |
| 74 | Glassdoor | 2.9 | 2 | 3.1 |
| 75 | Glassdoor | 3.1 | 1 | 3.1 |
| 76 | Glassdoor | 4.1 | 1 | 3.1 |
| 77 | Glassdoor | 4.3 | 1 | 3.1 |
| 78 | Glassdoor | 5.1 | 1 | 3.1 |
| 79 | Glassdoor | 5.3 | 1 | 3.1 |
| 80 | Glassdoor | 5.4 | 1 | 3.1 |
| 81 | Glassdoor | 5.6 | 1 | 3.1 |
| 82 | Indeed | 0.6 | 2 | 0.8 |
| 83 | Indeed | 0.8 | 2 | 0.8 |
| 84 | Indeed | 0.8 | 1 | 0.8 |
| 85 | Indeed | 0.9 | 1 | 0.8 |
| 86 | Indeed | 1.2 | 2 | 0.8 |
| 87 | Information Session | 2.7 | 1 | 4.0 |
| 88 | Information Session | 3.5 | 1 | 4.0 |
| 89 | Information Session | 4.5 | 1 | 4.0 |
| 90 | Information Session | 6.6 | 1 | 4.0 |
| 91 | Internet Search | 0.2 | 1 | 3.6 |
| 92 | Internet Search | 3.2 | 2 | 3.6 |
| 93 | Internet Search | 3.6 | 1 | 3.6 |
| 94 | Internet Search | 5.4 | 1 | 3.6 |
| 95 | Internet Search | 5.6 | 1 | 3.6 |
| 96 | MBTA ads | 1.2 | 1 | 4.2 |
| 97 | MBTA ads | 2.2 | 1 | 4.2 |
| 98 | MBTA ads | 3.1 | 1 | 4.2 |
| 99 | MBTA ads | 3.1 | 1 | 4.2 |
| 100 | MBTA ads | 3.4 | 1 | 4.2 |
| 101 | MBTA ads | 3.6 | 1 | 4.2 |
| 102 | MBTA ads | 3.7 | 1 | 4.2 |
| 103 | MBTA ads | 3.8 | 1 | 4.2 |
| 104 | MBTA ads | 4.5 | 1 | 4.2 |
| 105 | MBTA ads | 4.5 | 1 | 4.2 |
| 106 | MBTA ads | 4.7 | 1 | 4.2 |
| 107 | MBTA ads | 5.5 | 1 | 4.2 |
| 108 | MBTA ads | 5.7 | 1 | 4.2 |
| 109 | MBTA ads | 6.0 | 1 | 4.2 |
| 110 | MBTA ads | 6.5 | 2 | 4.2 |
| 111 | MBTA ads | 6.6 | 1 | 4.2 |
| 112 | Monster.com | 0.2 | 1 | 3.6 |
| 113 | Monster.com | 0.4 | 1 | 3.6 |
| 114 | Monster.com | 1.2 | 1 | 3.6 |
| 115 | Monster.com | 1.9 | 1 | 3.6 |
| 116 | Monster.com | 1.9 | 1 | 3.6 |
| 117 | Monster.com | 2.1 | 1 | 3.6 |
| 118 | Monster.com | 2.7 | 1 | 3.6 |
| 119 | Monster.com | 2.8 | 1 | 3.6 |
| 120 | Monster.com | 2.9 | 1 | 3.6 |
| 121 | Monster.com | 3.1 | 1 | 3.6 |
| 122 | Monster.com | 3.2 | 1 | 3.6 |
| 123 | Monster.com | 3.6 | 2 | 3.6 |
| 124 | Monster.com | 4.0 | 1 | 3.6 |
| 125 | Monster.com | 4.5 | 1 | 3.6 |
| 126 | Monster.com | 5.0 | 1 | 3.6 |
| 127 | Monster.com | 5.3 | 1 | 3.6 |
| 128 | Monster.com | 5.5 | 1 | 3.6 |
| 129 | Monster.com | 5.7 | 1 | 3.6 |
| 130 | Monster.com | 6.0 | 1 | 3.6 |
| 131 | Monster.com | 6.3 | 1 | 3.6 |
| 132 | Monster.com | 6.7 | 1 | 3.6 |
| 133 | Monster.com | 7.3 | 1 | 3.6 |
| 134 | Monster.com | 7.5 | 1 | 3.6 |
| 135 | Newspager/Magazine | 0.1 | 1 | 3.8 |
| 136 | Newspager/Magazine | 0.3 | 1 | 3.8 |
| 137 | Newspager/Magazine | 1.2 | 1 | 3.8 |
| 138 | Newspager/Magazine | 1.3 | 1 | 3.8 |
| 139 | Newspager/Magazine | 1.4 | 1 | 3.8 |
| 140 | Newspager/Magazine | 2.4 | 1 | 3.8 |
| 141 | Newspager/Magazine | 2.7 | 1 | 3.8 |
| 142 | Newspager/Magazine | 3.2 | 1 | 3.8 |
| 143 | Newspager/Magazine | 3.7 | 1 | 3.8 |
| 144 | Newspager/Magazine | 3.8 | 1 | 3.8 |
| 145 | Newspager/Magazine | 3.9 | 1 | 3.8 |
| 146 | Newspager/Magazine | 4.1 | 1 | 3.8 |
| 147 | Newspager/Magazine | 4.3 | 1 | 3.8 |
| 148 | Newspager/Magazine | 4.5 | 1 | 3.8 |
| 149 | Newspager/Magazine | 5.1 | 1 | 3.8 |
| 150 | Newspager/Magazine | 5.5 | 1 | 3.8 |
| 151 | Newspager/Magazine | 5.9 | 1 | 3.8 |
| 152 | Newspager/Magazine | 6.1 | 1 | 3.8 |
| 153 | On-campus Recruiting | 0.2 | 1 | 3.9 |
| 154 | On-campus Recruiting | 1.4 | 1 | 3.9 |
| 155 | On-campus Recruiting | 3.2 | 2 | 3.9 |
| 156 | On-campus Recruiting | 3.6 | 1 | 3.9 |
| 157 | On-campus Recruiting | 3.8 | 2 | 3.9 |
| 158 | On-campus Recruiting | 3.9 | 1 | 3.9 |
| 159 | On-campus Recruiting | 4.1 | 1 | 3.9 |
| 160 | On-campus Recruiting | 4.2 | 1 | 3.9 |
| 161 | On-campus Recruiting | 4.3 | 1 | 3.9 |
| 162 | On-campus Recruiting | 4.6 | 1 | 3.9 |
| 163 | On-line Web application | 0.5 | 1 | 0.5 |
| 164 | Other | 1.1 | 1 | 5.5 |
| 165 | Other | 1.6 | 1 | 5.5 |
| 166 | Other | 2.5 | 1 | 5.5 |
| 167 | Other | 4.5 | 1 | 5.5 |
| 168 | Other | 6.5 | 1 | 5.5 |
| 169 | Other | 6.6 | 1 | 5.5 |
| 170 | Other | 7.0 | 1 | 5.5 |
| 171 | Other | 9.0 | 2 | 5.5 |
| 172 | Pay Per Click | 0.0 | 1 | 0.0 |
| 173 | Pay Per Click - Google | 0.2 | 1 | 3.6 |
| 174 | Pay Per Click - Google | 0.5 | 1 | 3.6 |
| 175 | Pay Per Click - Google | 0.7 | 1 | 3.6 |
| 176 | Pay Per Click - Google | 1.9 | 1 | 3.6 |
| 177 | Pay Per Click - Google | 2.7 | 1 | 3.6 |
| 178 | Pay Per Click - Google | 2.8 | 1 | 3.6 |
| 179 | Pay Per Click - Google | 3.2 | 1 | 3.6 |
| 180 | Pay Per Click - Google | 3.2 | 1 | 3.6 |
| 181 | Pay Per Click - Google | 3.3 | 1 | 3.6 |
| 182 | Pay Per Click - Google | 3.4 | 1 | 3.6 |
| 183 | Pay Per Click - Google | 3.6 | 1 | 3.6 |
| 184 | Pay Per Click - Google | 3.6 | 1 | 3.6 |
| 185 | Pay Per Click - Google | 4.1 | 1 | 3.6 |
| 186 | Pay Per Click - Google | 4.2 | 1 | 3.6 |
| 187 | Pay Per Click - Google | 4.3 | 1 | 3.6 |
| 188 | Pay Per Click - Google | 4.5 | 1 | 3.6 |
| 189 | Pay Per Click - Google | 5.0 | 1 | 3.6 |
| 190 | Pay Per Click - Google | 5.1 | 1 | 3.6 |
| 191 | Pay Per Click - Google | 5.5 | 1 | 3.6 |
| 192 | Pay Per Click - Google | 6.2 | 1 | 3.6 |
| 193 | Pay Per Click - Google | 7.0 | 1 | 3.6 |
| 194 | Professional Society | 1.2 | 1 | 3.8 |
| 195 | Professional Society | 2.1 | 1 | 3.8 |
| 196 | Professional Society | 2.6 | 1 | 3.8 |
| 197 | Professional Society | 2.7 | 1 | 3.8 |
| 198 | Professional Society | 3.2 | 1 | 3.8 |
| 199 | Professional Society | 3.2 | 1 | 3.8 |
| 200 | Professional Society | 3.4 | 4 | 3.8 |
| 201 | Professional Society | 3.6 | 1 | 3.8 |
| 202 | Professional Society | 3.8 | 1 | 3.8 |
| 203 | Professional Society | 3.9 | 1 | 3.8 |
| 204 | Professional Society | 4.3 | 1 | 3.8 |
| 205 | Professional Society | 4.5 | 1 | 3.8 |
| 206 | Professional Society | 4.7 | 1 | 3.8 |
| 207 | Professional Society | 4.9 | 1 | 3.8 |
| 208 | Professional Society | 6.1 | 1 | 3.8 |
| 209 | Professional Society | 6.7 | 1 | 3.8 |
| 210 | Professional Society | 7.3 | 1 | 3.8 |
| 211 | Search Engine - Google Bing Yahoo | 0.1 | 1 | 3.2 |
| 212 | Search Engine - Google Bing Yahoo | 0.3 | 1 | 3.2 |
| 213 | Search Engine - Google Bing Yahoo | 1.2 | 1 | 3.2 |
| 214 | Search Engine - Google Bing Yahoo | 1.2 | 1 | 3.2 |
| 215 | Search Engine - Google Bing Yahoo | 1.2 | 1 | 3.2 |
| 216 | Search Engine - Google Bing Yahoo | 1.4 | 1 | 3.2 |
| 217 | Search Engine - Google Bing Yahoo | 1.4 | 1 | 3.2 |
| 218 | Search Engine - Google Bing Yahoo | 2.0 | 1 | 3.2 |
| 219 | Search Engine - Google Bing Yahoo | 2.1 | 1 | 3.2 |
| 220 | Search Engine - Google Bing Yahoo | 2.2 | 1 | 3.2 |
| 221 | Search Engine - Google Bing Yahoo | 2.6 | 1 | 3.2 |
| 222 | Search Engine - Google Bing Yahoo | 3.2 | 1 | 3.2 |
| 223 | Search Engine - Google Bing Yahoo | 3.2 | 1 | 3.2 |
| 224 | Search Engine - Google Bing Yahoo | 3.3 | 1 | 3.2 |
| 225 | Search Engine - Google Bing Yahoo | 3.7 | 1 | 3.2 |
| 226 | Search Engine - Google Bing Yahoo | 4.1 | 1 | 3.2 |
| 227 | Search Engine - Google Bing Yahoo | 4.2 | 1 | 3.2 |
| 228 | Search Engine - Google Bing Yahoo | 4.7 | 1 | 3.2 |
| 229 | Search Engine - Google Bing Yahoo | 5.6 | 1 | 3.2 |
| 230 | Search Engine - Google Bing Yahoo | 6.1 | 1 | 3.2 |
| 231 | Search Engine - Google Bing Yahoo | 6.4 | 1 | 3.2 |
| 232 | Search Engine - Google Bing Yahoo | 6.7 | 1 | 3.2 |
| 233 | Search Engine - Google Bing Yahoo | 7.2 | 1 | 3.2 |
| 234 | Search Engine - Google Bing Yahoo | 7.7 | 1 | 3.2 |
| 235 | Search Engine - Google Bing Yahoo | 10.2 | 1 | 3.2 |
| 236 | Social Networks - Facebook Twitter etc | 0.1 | 1 | 3.0 |
| 237 | Social Networks - Facebook Twitter etc | 0.3 | 1 | 3.0 |
| 238 | Social Networks - Facebook Twitter etc | 0.5 | 1 | 3.0 |
| 239 | Social Networks - Facebook Twitter etc | 2.1 | 1 | 3.0 |
| 240 | Social Networks - Facebook Twitter etc | 2.5 | 1 | 3.0 |
| 241 | Social Networks - Facebook Twitter etc | 3.0 | 1 | 3.0 |
| 242 | Social Networks - Facebook Twitter etc | 3.0 | 1 | 3.0 |
| 243 | Social Networks - Facebook Twitter etc | 4.4 | 1 | 3.0 |
| 244 | Social Networks - Facebook Twitter etc | 5.4 | 1 | 3.0 |
| 245 | Social Networks - Facebook Twitter etc | 5.9 | 1 | 3.0 |
| 246 | Social Networks - Facebook Twitter etc | 6.0 | 1 | 3.0 |
| 247 | Vendor Referral | 0.8 | 1 | 2.4 |
| 248 | Vendor Referral | 0.9 | 1 | 2.4 |
| 249 | Vendor Referral | 1.4 | 1 | 2.4 |
| 250 | Vendor Referral | 1.5 | 1 | 2.4 |
| 251 | Vendor Referral | 1.5 | 1 | 2.4 |
| 252 | Vendor Referral | 1.6 | 1 | 2.4 |
| 253 | Vendor Referral | 2.0 | 1 | 2.4 |
| 254 | Vendor Referral | 2.7 | 1 | 2.4 |
| 255 | Vendor Referral | 3.1 | 2 | 2.4 |
| 256 | Vendor Referral | 3.2 | 1 | 2.4 |
| 257 | Vendor Referral | 3.4 | 1 | 2.4 |
| 258 | Vendor Referral | 4.5 | 1 | 2.4 |
| 259 | Vendor Referral | 5.2 | 1 | 2.4 |
| 260 | Vendor Referral | 7.7 | 1 | 2.4 |
| 261 | Website Banner Ads | 1.4 | 1 | 4.1 |
| 262 | Website Banner Ads | 2.6 | 1 | 4.1 |
| 263 | Website Banner Ads | 2.8 | 1 | 4.1 |
| 264 | Website Banner Ads | 2.9 | 1 | 4.1 |
| 265 | Website Banner Ads | 3.6 | 1 | 4.1 |
| 266 | Website Banner Ads | 3.9 | 2 | 4.1 |
| 267 | Website Banner Ads | 4.2 | 1 | 4.1 |
| 268 | Website Banner Ads | 4.4 | 1 | 4.1 |
| 269 | Website Banner Ads | 5.7 | 1 | 4.1 |
| 270 | Website Banner Ads | 5.8 | 1 | 4.1 |
| 271 | Website Banner Ads | 5.9 | 1 | 4.1 |
| 272 | Website Banner Ads | 6.0 | 1 | 4.1 |
| 273 | Word of Mouth | 0.0 | 1 | 2.5 |
| 274 | Word of Mouth | 0.2 | 1 | 2.5 |
| 275 | Word of Mouth | 0.8 | 1 | 2.5 |
| 276 | Word of Mouth | 1.1 | 1 | 2.5 |
| 277 | Word of Mouth | 1.6 | 1 | 2.5 |
| 278 | Word of Mouth | 1.7 | 1 | 2.5 |
| 279 | Word of Mouth | 2.5 | 1 | 2.5 |
| 280 | Word of Mouth | 2.9 | 1 | 2.5 |
| 281 | Word of Mouth | 3.2 | 1 | 2.5 |
| 282 | Word of Mouth | 4.2 | 1 | 2.5 |
| 283 | Word of Mouth | 5.7 | 1 | 2.5 |
| 284 | Word of Mouth | 6.1 | 1 | 2.5 |
| 285 | Word of Mouth | 7.0 | 1 | 2.5 |
# Создадим график распределения сроков работы в компании в зависимости от источников найма
g=sns.displot(data=dfg_term_over_source,
x="Years Employed",
hue="Employee Source",
col="Employee Source",
col_wrap=2,
binwidth=0.25,
height=3,
aspect=2,
palette="Paired",
# multiple="layer",
# element="step",
alpha=1,
facet_kws=dict(sharex=True, sharey=True)
)
# Определяем шкалу X:
scale_step = 1 # Определяем шаг шкалы X - его можно менять!
# Определяем основной диапазон шкалы
scale_span = list(range((int(dfg_term_over_status['Years Employed'].min() / scale_step) * scale_step),
int(dfg_term_over_status['Years Employed'].max()),
scale_step))
# При неоходимости дополняем первый и послдений элементы к шкале (если они не вошли в шкалу ранее)
if dfg_term_over_status['Years Employed'].min() < scale_span[0]:
scale_span = ([dfg_term_over_status['Years Employed'].min()] + scale_span)
if dfg_term_over_status['Years Employed'].max() > scale_span[-1]:
scale_span = scale_span + [dfg_term_over_status['Years Employed'].max()]
plt.xticks(ticks=scale_span, fontsize=10, rotation=0) # установим шкалу X и размер её обозначений
plt.yticks(ticks=list(range(0,5,1)), fontsize=10) # Установим шаг и размер обозначения для шкалы Y
g.set_xlabels(fontsize=12) # Размер подписей шкалы X
g.set_ylabels(fontsize=12) # Размер подписей шкалы Y
# Заголовок графика:
g.fig.suptitle("Распределение сроков работы в компании в зависимости от источников найма",
fontsize=20, x=0.4125, y=1.0125)
# Добавим вертикальные линии медианы и средней арифметической срока работы в компании
# Для этого определим функцию построения вертикальлной линии и её подписи
def years_employed_lines(x, **kwargs):
# Построение линии медианы на основе принимаемых значений x, цвет, толщина, стиль
plt.axvline(x.unique()[0], color='navy', linewidth=2, linestyle=':')
# Создание подписи к линии медианы
plt.annotate(
text=f"медиана {x.unique()[0]}", # Аннотация линии медианы.
xy=(x.unique()[0]-0.15, 3.5), # Положение подписи в единицах шкал графика
horizontalalignment='center', # Выравнивание текста по горизонтали
verticalalignment='top', # Выравнивание текста по вертикали
rotation="vertical", # Поворот подписи
color='navy', # Цвет надписи
alpha=1,
fontsize=12
)
# Определяем "разметку" для исполнения функции построения линий
# Передаём функции величины возрастов
g.map(years_employed_lines, 'term_median')
plt.show()
ВЫВОД
# Создадим DF для исследования распределения сроков работы в компаниии по дате найма
sql_quiery = \
"""
WITH
term_dates AS
(SELECT
"Date of Hire",
ROUND("Days Employed"::numeric / 360, 1) AS "Years Employed"
FROM
hr_dataset
ORDER BY
"Date of Hire"
),
minimax_term_dates AS
(SELECT
"Date of Hire",
MIN("Years Employed") AS "Min Years Employed",
MAX("Years Employed") AS "Max Years Employed"
FROM
term_dates
GROUP BY
"Date of Hire"
)
SELECT
*
FROM
term_dates
LEFT JOIN
minimax_term_dates
USING ("Date of Hire")
;
"""
dfg_term_over_date_hire = pd.read_sql(sql_quiery, conn, index_col="Date of Hire")
dfg_term_over_date_hire
| Years Employed | Min Years Employed | Max Years Employed | |
|---|---|---|---|
| Date of Hire | |||
| 2006-01-09 | 12.1 | 12.1 | 12.1 |
| 2007-06-25 | 3.2 | 3.2 | 3.2 |
| 2007-11-05 | 10.2 | 10.2 | 10.2 |
| 2008-01-07 | 10.0 | 10.0 | 10.0 |
| 2008-09-02 | 7.2 | 7.2 | 7.2 |
| 2008-10-27 | 9.2 | 9.2 | 9.2 |
| 2009-01-05 | 9.0 | 1.6 | 9.0 |
| 2009-01-05 | 1.6 | 1.6 | 9.0 |
| 2009-01-05 | 9.0 | 1.6 | 9.0 |
| 2009-01-08 | 9.0 | 9.0 | 9.0 |
| 2009-04-27 | 4.0 | 4.0 | 4.0 |
| 2009-07-06 | 8.5 | 8.5 | 8.5 |
| 2009-10-26 | 5.5 | 5.5 | 5.5 |
| 2010-04-10 | 7.7 | 7.7 | 7.7 |
| 2010-04-26 | 1.1 | 1.1 | 7.7 |
| 2010-04-26 | 7.7 | 1.1 | 7.7 |
| 2010-05-01 | 7.7 | 7.7 | 7.7 |
| 2010-07-20 | 7.5 | 7.5 | 7.5 |
| 2010-08-30 | 1.1 | 1.1 | 7.3 |
| 2010-08-30 | 7.3 | 1.1 | 7.3 |
| 2010-09-27 | 7.3 | 7.3 | 7.3 |
| 2010-10-25 | 5.6 | 5.6 | 5.6 |
| 2011-01-10 | 3.3 | 0.0 | 7.0 |
| 2011-01-10 | 7.0 | 0.0 | 7.0 |
| 2011-01-10 | 2.1 | 0.0 | 7.0 |
| 2011-01-10 | 0.0 | 0.0 | 7.0 |
| 2011-01-10 | 5.3 | 0.0 | 7.0 |
| 2011-01-10 | 0.3 | 0.0 | 7.0 |
| 2011-01-10 | 0.3 | 0.0 | 7.0 |
| 2011-01-10 | 7.0 | 0.0 | 7.0 |
| 2011-01-10 | 2.0 | 0.0 | 7.0 |
| 2011-01-10 | 7.0 | 0.0 | 7.0 |
| 2011-01-10 | 5.4 | 0.0 | 7.0 |
| 2011-01-10 | 1.5 | 0.0 | 7.0 |
| 2011-01-10 | 5.1 | 0.0 | 7.0 |
| 2011-01-10 | 5.0 | 0.0 | 7.0 |
| 2011-01-21 | 6.9 | 6.9 | 6.9 |
| 2011-02-07 | 3.0 | 3.0 | 3.0 |
| 2011-02-21 | 2.9 | 0.5 | 4.5 |
| 2011-02-21 | 0.5 | 0.5 | 4.5 |
| 2011-02-21 | 2.1 | 0.5 | 4.5 |
| 2011-02-21 | 1.6 | 0.5 | 4.5 |
| 2011-02-21 | 4.5 | 0.5 | 4.5 |
| 2011-03-07 | 3.7 | 3.7 | 6.8 |
| 2011-03-07 | 6.8 | 3.7 | 6.8 |
| 2011-04-04 | 2.2 | 0.8 | 6.7 |
| 2011-04-04 | 6.7 | 0.8 | 6.7 |
| 2011-04-04 | 0.8 | 0.8 | 6.7 |
| 2011-04-04 | 4.7 | 0.8 | 6.7 |
| 2011-04-04 | 6.7 | 0.8 | 6.7 |
| 2011-04-04 | 6.7 | 0.8 | 6.7 |
| 2011-04-04 | 1.4 | 0.8 | 6.7 |
| 2011-04-15 | 6.7 | 6.7 | 6.7 |
| 2011-05-02 | 2.1 | 2.1 | 2.1 |
| 2011-05-16 | 2.1 | 0.1 | 6.6 |
| 2011-05-16 | 4.4 | 0.1 | 6.6 |
| 2011-05-16 | 0.7 | 0.1 | 6.6 |
| 2011-05-16 | 4.1 | 0.1 | 6.6 |
| 2011-05-16 | 1.2 | 0.1 | 6.6 |
| 2011-05-16 | 4.5 | 0.1 | 6.6 |
| 2011-05-16 | 0.1 | 0.1 | 6.6 |
| 2011-05-16 | 6.6 | 0.1 | 6.6 |
| 2011-05-16 | 1.7 | 0.1 | 6.6 |
| 2011-05-16 | 4.7 | 0.1 | 6.6 |
| 2011-05-31 | 6.6 | 6.6 | 6.6 |
| 2011-06-10 | 6.6 | 6.6 | 6.6 |
| 2011-06-27 | 4.5 | 4.5 | 4.5 |
| 2011-07-05 | 6.5 | 0.1 | 6.5 |
| 2011-07-05 | 0.2 | 0.1 | 6.5 |
| 2011-07-05 | 1.4 | 0.1 | 6.5 |
| 2011-07-05 | 0.1 | 0.1 | 6.5 |
| 2011-07-05 | 0.2 | 0.1 | 6.5 |
| 2011-07-05 | 0.6 | 0.1 | 6.5 |
| 2011-07-05 | 6.5 | 0.1 | 6.5 |
| 2011-07-05 | 0.2 | 0.1 | 6.5 |
| 2011-07-05 | 6.5 | 0.1 | 6.5 |
| 2011-07-05 | 1.2 | 0.1 | 6.5 |
| 2011-07-11 | 6.5 | 0.2 | 6.5 |
| 2011-07-11 | 0.2 | 0.2 | 6.5 |
| 2011-07-11 | 1.2 | 0.2 | 6.5 |
| 2011-08-01 | 6.4 | 6.4 | 6.4 |
| 2011-08-15 | 3.1 | 0.7 | 6.4 |
| 2011-08-15 | 3.0 | 0.7 | 6.4 |
| 2011-08-15 | 0.7 | 0.7 | 6.4 |
| 2011-08-15 | 6.4 | 0.7 | 6.4 |
| 2011-09-06 | 6.3 | 6.3 | 6.3 |
| 2011-09-26 | 1.9 | 0.1 | 4.4 |
| 2011-09-26 | 0.3 | 0.1 | 4.4 |
| 2011-09-26 | 0.5 | 0.1 | 4.4 |
| 2011-09-26 | 3.7 | 0.1 | 4.4 |
| 2011-09-26 | 0.3 | 0.1 | 4.4 |
| 2011-09-26 | 2.6 | 0.1 | 4.4 |
| 2011-09-26 | 2.0 | 0.1 | 4.4 |
| 2011-09-26 | 0.1 | 0.1 | 4.4 |
| 2011-09-26 | 4.4 | 0.1 | 4.4 |
| 2011-10-03 | 6.2 | 6.2 | 6.2 |
| 2011-11-07 | 0.0 | 0.0 | 6.1 |
| 2011-11-07 | 4.1 | 0.0 | 6.1 |
| 2011-11-07 | 6.1 | 0.0 | 6.1 |
| 2011-11-07 | 3.9 | 0.0 | 6.1 |
| 2011-11-07 | 2.6 | 0.0 | 6.1 |
| 2011-11-07 | 2.5 | 0.0 | 6.1 |
| 2011-11-07 | 4.5 | 0.0 | 6.1 |
| 2011-11-07 | 6.1 | 0.0 | 6.1 |
| 2011-11-28 | 6.1 | 6.1 | 6.1 |
| 2011-11-28 | 6.1 | 6.1 | 6.1 |
| 2012-01-09 | 3.5 | 3.5 | 6.0 |
| 2012-01-09 | 6.0 | 3.5 | 6.0 |
| 2012-01-09 | 3.9 | 3.5 | 6.0 |
| 2012-01-09 | 6.0 | 3.5 | 6.0 |
| 2012-01-09 | 4.0 | 3.5 | 6.0 |
| 2012-01-09 | 6.0 | 3.5 | 6.0 |
| 2012-01-09 | 6.0 | 3.5 | 6.0 |
| 2012-01-09 | 6.0 | 3.5 | 6.0 |
| 2012-02-15 | 5.9 | 5.9 | 5.9 |
| 2012-02-20 | 5.9 | 5.9 | 5.9 |
| 2012-02-20 | 5.9 | 5.9 | 5.9 |
| 2012-02-20 | 5.9 | 5.9 | 5.9 |
| 2012-02-20 | 5.9 | 5.9 | 5.9 |
| 2012-03-05 | 5.8 | 5.8 | 5.8 |
| 2012-03-05 | 5.8 | 5.8 | 5.8 |
| 2012-04-02 | 5.7 | 0.5 | 5.7 |
| 2012-04-02 | 5.7 | 0.5 | 5.7 |
| 2012-04-02 | 3.3 | 0.5 | 5.7 |
| 2012-04-02 | 2.5 | 0.5 | 5.7 |
| 2012-04-02 | 5.7 | 0.5 | 5.7 |
| 2012-04-02 | 1.1 | 0.5 | 5.7 |
| 2012-04-02 | 3.7 | 0.5 | 5.7 |
| 2012-04-02 | 0.5 | 0.5 | 5.7 |
| 2012-04-02 | 1.2 | 0.5 | 5.7 |
| 2012-04-30 | 5.7 | 5.7 | 5.7 |
| 2012-05-14 | 5.6 | 1.3 | 5.6 |
| 2012-05-14 | 5.6 | 1.3 | 5.6 |
| 2012-05-14 | 5.6 | 1.3 | 5.6 |
| 2012-05-14 | 1.3 | 1.3 | 5.6 |
| 2012-07-02 | 5.5 | 5.5 | 5.5 |
| 2012-07-02 | 5.5 | 5.5 | 5.5 |
| 2012-07-02 | 5.5 | 5.5 | 5.5 |
| 2012-07-09 | 5.5 | 5.5 | 5.5 |
| 2012-08-13 | 5.4 | 3.1 | 5.4 |
| 2012-08-13 | 3.1 | 3.1 | 5.4 |
| 2012-08-13 | 3.5 | 3.1 | 5.4 |
| 2012-08-16 | 5.4 | 5.4 | 5.4 |
| 2012-09-05 | 5.3 | 5.3 | 5.3 |
| 2012-09-24 | 0.0 | 0.0 | 0.7 |
| 2012-09-24 | 0.5 | 0.0 | 0.7 |
| 2012-09-24 | 0.7 | 0.0 | 0.7 |
| 2012-10-02 | 5.2 | 5.2 | 5.2 |
| 2012-11-05 | 5.1 | 5.1 | 5.1 |
| 2012-11-05 | 5.1 | 5.1 | 5.1 |
| 2013-01-07 | 5.0 | 1.2 | 5.0 |
| 2013-01-07 | 1.2 | 1.2 | 5.0 |
| 2013-01-07 | 5.0 | 1.2 | 5.0 |
| 2013-01-07 | 3.2 | 1.2 | 5.0 |
| 2013-01-20 | 4.9 | 4.9 | 4.9 |
| 2013-02-18 | 4.8 | 1.2 | 4.8 |
| 2013-02-18 | 1.2 | 1.2 | 4.8 |
| 2013-04-01 | 3.2 | 3.2 | 4.7 |
| 2013-04-01 | 4.7 | 3.2 | 4.7 |
| 2013-04-01 | 4.7 | 3.2 | 4.7 |
| 2013-05-13 | 4.6 | 2.2 | 4.6 |
| 2013-05-13 | 4.6 | 2.2 | 4.6 |
| 2013-05-13 | 2.2 | 2.2 | 4.6 |
| 2013-07-08 | 0.2 | 0.2 | 4.5 |
| 2013-07-08 | 4.5 | 0.2 | 4.5 |
| 2013-07-08 | 4.5 | 0.2 | 4.5 |
| 2013-07-08 | 4.5 | 0.2 | 4.5 |
| 2013-07-08 | 4.5 | 0.2 | 4.5 |
| 2013-07-08 | 4.5 | 0.2 | 4.5 |
| 2013-07-08 | 4.5 | 0.2 | 4.5 |
| 2013-07-08 | 4.5 | 0.2 | 4.5 |
| 2013-07-08 | 4.5 | 0.2 | 4.5 |
| 2013-08-19 | 4.3 | 4.3 | 4.3 |
| 2013-08-19 | 4.3 | 4.3 | 4.3 |
| 2013-08-19 | 4.3 | 4.3 | 4.3 |
| 2013-08-19 | 4.3 | 4.3 | 4.3 |
| 2013-08-19 | 4.3 | 4.3 | 4.3 |
| 2013-08-19 | 4.3 | 4.3 | 4.3 |
| 2013-09-30 | 4.2 | 0.9 | 4.2 |
| 2013-09-30 | 4.2 | 0.9 | 4.2 |
| 2013-09-30 | 4.2 | 0.9 | 4.2 |
| 2013-09-30 | 4.2 | 0.9 | 4.2 |
| 2013-09-30 | 4.2 | 0.9 | 4.2 |
| 2013-09-30 | 4.2 | 0.9 | 4.2 |
| 2013-09-30 | 4.2 | 0.9 | 4.2 |
| 2013-09-30 | 0.9 | 0.9 | 4.2 |
| 2013-09-30 | 4.2 | 0.9 | 4.2 |
| 2013-11-11 | 4.1 | 4.1 | 4.1 |
| 2013-11-11 | 4.1 | 4.1 | 4.1 |
| 2013-11-11 | 4.1 | 4.1 | 4.1 |
| 2013-11-11 | 4.1 | 4.1 | 4.1 |
| 2013-11-11 | 4.1 | 4.1 | 4.1 |
| 2013-11-11 | 4.1 | 4.1 | 4.1 |
| 2013-11-11 | 4.1 | 4.1 | 4.1 |
| 2014-01-05 | 3.9 | 3.9 | 3.9 |
| 2014-01-06 | 3.9 | 3.9 | 3.9 |
| 2014-01-06 | 3.9 | 3.9 | 3.9 |
| 2014-01-06 | 3.9 | 3.9 | 3.9 |
| 2014-01-06 | 3.9 | 3.9 | 3.9 |
| 2014-01-06 | 3.9 | 3.9 | 3.9 |
| 2014-02-17 | 3.8 | 0.0 | 3.8 |
| 2014-02-17 | 3.8 | 0.0 | 3.8 |
| 2014-02-17 | 3.8 | 0.0 | 3.8 |
| 2014-02-17 | 2.0 | 0.0 | 3.8 |
| 2014-02-17 | 3.8 | 0.0 | 3.8 |
| 2014-02-17 | 0.0 | 0.0 | 3.8 |
| 2014-02-17 | 3.8 | 0.0 | 3.8 |
| 2014-03-31 | 2.1 | 2.1 | 3.7 |
| 2014-03-31 | 3.7 | 2.1 | 3.7 |
| 2014-03-31 | 3.7 | 2.1 | 3.7 |
| 2014-05-05 | 3.6 | 3.6 | 3.6 |
| 2014-05-05 | 3.6 | 3.6 | 3.6 |
| 2014-05-12 | 3.6 | 3.6 | 3.6 |
| 2014-05-12 | 3.6 | 3.6 | 3.6 |
| 2014-05-12 | 3.6 | 3.6 | 3.6 |
| 2014-05-12 | 3.6 | 3.6 | 3.6 |
| 2014-05-12 | 3.6 | 3.6 | 3.6 |
| 2014-05-12 | 3.6 | 3.6 | 3.6 |
| 2014-05-12 | 3.6 | 3.6 | 3.6 |
| 2014-05-18 | 3.6 | 3.6 | 3.6 |
| 2014-07-07 | 3.4 | 1.2 | 3.4 |
| 2014-07-07 | 3.4 | 1.2 | 3.4 |
| 2014-07-07 | 3.4 | 1.2 | 3.4 |
| 2014-07-07 | 3.4 | 1.2 | 3.4 |
| 2014-07-07 | 3.4 | 1.2 | 3.4 |
| 2014-07-07 | 3.4 | 1.2 | 3.4 |
| 2014-07-07 | 1.2 | 1.2 | 3.4 |
| 2014-07-07 | 3.4 | 1.2 | 3.4 |
| 2014-07-07 | 1.2 | 1.2 | 3.4 |
| 2014-08-18 | 3.3 | 3.3 | 3.3 |
| 2014-08-18 | 3.3 | 3.3 | 3.3 |
| 2014-08-18 | 3.3 | 3.3 | 3.3 |
| 2014-09-18 | 3.2 | 3.2 | 3.2 |
| 2014-09-29 | 3.2 | 3.2 | 3.2 |
| 2014-09-29 | 3.2 | 3.2 | 3.2 |
| 2014-09-29 | 3.2 | 3.2 | 3.2 |
| 2014-09-29 | 3.2 | 3.2 | 3.2 |
| 2014-09-29 | 3.2 | 3.2 | 3.2 |
| 2014-09-29 | 3.2 | 3.2 | 3.2 |
| 2014-09-29 | 3.2 | 3.2 | 3.2 |
| 2014-09-29 | 3.2 | 3.2 | 3.2 |
| 2014-09-29 | 3.2 | 3.2 | 3.2 |
| 2014-09-29 | 3.2 | 3.2 | 3.2 |
| 2014-09-29 | 3.2 | 3.2 | 3.2 |
| 2014-09-30 | 3.2 | 3.2 | 3.2 |
| 2014-11-10 | 3.1 | 3.1 | 3.1 |
| 2014-11-10 | 3.1 | 3.1 | 3.1 |
| 2014-11-10 | 3.1 | 3.1 | 3.1 |
| 2014-11-10 | 3.1 | 3.1 | 3.1 |
| 2014-11-10 | 3.1 | 3.1 | 3.1 |
| 2014-11-10 | 3.1 | 3.1 | 3.1 |
| 2014-11-10 | 3.1 | 3.1 | 3.1 |
| 2014-11-10 | 3.1 | 3.1 | 3.1 |
| 2014-12-01 | 1.4 | 1.4 | 1.4 |
| 2015-01-05 | 2.9 | 0.4 | 2.9 |
| 2015-01-05 | 0.9 | 0.4 | 2.9 |
| 2015-01-05 | 0.4 | 0.4 | 2.9 |
| 2015-01-05 | 2.9 | 0.4 | 2.9 |
| 2015-01-05 | 2.9 | 0.4 | 2.9 |
| 2015-01-05 | 2.9 | 0.4 | 2.9 |
| 2015-01-05 | 2.9 | 0.4 | 2.9 |
| 2015-01-05 | 0.8 | 0.4 | 2.9 |
| 2015-01-05 | 2.9 | 0.4 | 2.9 |
| 2015-01-05 | 2.9 | 0.4 | 2.9 |
| 2015-01-05 | 2.9 | 0.4 | 2.9 |
| 2015-02-16 | 2.8 | 0.0 | 2.8 |
| 2015-02-16 | 2.8 | 0.0 | 2.8 |
| 2015-02-16 | 2.8 | 0.0 | 2.8 |
| 2015-02-16 | 0.0 | 0.0 | 2.8 |
| 2015-02-16 | 2.8 | 0.0 | 2.8 |
| 2015-02-16 | 0.1 | 0.0 | 2.8 |
| 2015-02-16 | 2.8 | 0.0 | 2.8 |
| 2015-02-16 | 0.2 | 0.0 | 2.8 |
| 2015-03-30 | 2.7 | 1.2 | 2.7 |
| 2015-03-30 | 2.7 | 1.2 | 2.7 |
| 2015-03-30 | 2.7 | 1.2 | 2.7 |
| 2015-03-30 | 2.7 | 1.2 | 2.7 |
| 2015-03-30 | 1.2 | 1.2 | 2.7 |
| 2015-03-30 | 2.7 | 1.2 | 2.7 |
| 2015-03-30 | 2.7 | 1.2 | 2.7 |
| 2015-03-30 | 2.7 | 1.2 | 2.7 |
| 2015-03-30 | 2.7 | 1.2 | 2.7 |
| 2015-03-30 | 2.7 | 1.2 | 2.7 |
| 2015-03-30 | 2.7 | 1.2 | 2.7 |
| 2015-03-30 | 2.7 | 1.2 | 2.7 |
| 2015-05-01 | 2.6 | 2.6 | 2.6 |
| 2015-05-11 | 2.6 | 2.6 | 2.6 |
| 2015-06-02 | 2.5 | 2.5 | 2.5 |
| 2015-06-05 | 2.5 | 2.5 | 2.5 |
| 2015-07-05 | 2.4 | 2.4 | 2.4 |
| 2016-01-05 | 1.9 | 1.9 | 1.9 |
| 2016-01-28 | 1.9 | 1.9 | 1.9 |
| 2016-05-11 | 1.6 | 1.6 | 1.6 |
| 2016-06-06 | 1.5 | 1.5 | 1.5 |
| 2016-06-30 | 1.4 | 1.4 | 1.4 |
| 2016-06-30 | 1.4 | 1.4 | 1.4 |
| 2016-07-04 | 1.4 | 1.4 | 1.4 |
| 2016-07-06 | 1.4 | 1.4 | 1.4 |
| 2016-07-06 | 1.4 | 1.4 | 1.4 |
| 2016-07-06 | 1.4 | 1.4 | 1.4 |
| 2016-07-21 | 1.4 | 1.4 | 1.4 |
| 2016-09-06 | 1.2 | 1.2 | 1.2 |
| 2016-10-02 | 1.2 | 1.2 | 1.2 |
| 2016-10-02 | 1.2 | 1.2 | 1.2 |
| 2017-01-07 | 0.9 | 0.9 | 0.9 |
| 2017-02-10 | 0.8 | 0.8 | 0.8 |
| 2017-02-15 | 0.8 | 0.8 | 0.8 |
| 2017-02-15 | 0.8 | 0.8 | 0.8 |
| 2017-04-20 | 0.6 | 0.6 | 0.6 |
| 2017-04-20 | 0.6 | 0.6 | 0.6 |
# Создадим график зависимости срока работы в компании от месяца и года найма
g=sns.relplot(data=dfg_term_over_date_hire,
# x="Date of Hire",
# y="Years Employed",
height=5,
aspect=2.5,
palette="tab10",
alpha=1,
# ci=True,
kind="line",
# marker='o'
)
# Определим шкалу как временной диапазон в формате Pandas с квартальными интервалами (для читаемости):
scale_span = pd.date_range(start=dfg_term_over_date_hire.index.min(),
end= dfg_term_over_date_hire.index.max(),
freq='QS').tolist()
plt.xticks(ticks=scale_span, fontsize=12, rotation=90) # установим шкалу X и размер её обозначений
plt.yticks(fontsize=14) # Установим размер обозначения для шкалы Y
# Определим формат подписей шкалы как YYYY-MM
g.axes[0,0].xaxis.set_major_formatter(mpl.dates.DateFormatter('%Y-%m'))
g.set_xlabels("Month of hire", fontsize=14) # Размер подписей шкалы X
g.set_ylabels("Years Employed", fontsize=14) # Размер подписей шкалы Y
# Обозначим отдельным цветом "критичные" части графика
g.axes[0,0].axvspan("2010-03", "2011-12", facecolor="gold", alpha=0.5)
g.axes[0,0].axvspan("2012-03", "2012-10", facecolor="gold", alpha=0.5)
g.axes[0,0].axvspan("2013-01", "2014-10", facecolor="gold", alpha=0.5)
def delta_unix(y, m, d): # функция вычисления количества дней с начала эпохи UNIX
n = dt.datetime(y, m, d) - dt.datetime(1970, 1, 1)
return n.days
# Для временной шкалы X координаты в формате float определяются как число дней с 01-01-1970
g.axes[0,0].axline((delta_unix(2006,1,1), 7.25),
(delta_unix(2013,6,1), 0),
color='crimson')
g.axes[0,0].axline((delta_unix(2009,10,1), 5.5),
(delta_unix(2015,10,1), 0),
color='darkblue',
linestyle='-.')
# Заголовок графика:
g.fig.suptitle("Зависимость срока работы в компании от месяца найма", fontsize=16, x=0.45, y=1.025)
plt.show()
ВЫВОД
Дополним полученные выводы ещё одним графиком
# Создадим DF для исследования распределения сроков работы в компаниии по дате увольнения
sql_quiery = \
"""
WITH
term_dates AS
(SELECT
"Date of Termination",
ROUND("Days Employed"::numeric / 360, 1) AS "Years Employed"
FROM
hr_dataset
WHERE
"Employment Status" = 'Voluntarily Terminated' OR
"Employment Status" = 'Terminated for Cause'
ORDER BY
"Date of Termination"
),
minimax_term_dates AS
(SELECT
"Date of Termination",
MIN("Years Employed") AS "Min Years Employed",
MAX("Years Employed") AS "Max Years Employed"
FROM
term_dates
GROUP BY
"Date of Termination"
)
SELECT
*
FROM
term_dates
LEFT JOIN
minimax_term_dates
USING ("Date of Termination")
;
"""
dfg_term_over_dt_termination = pd.read_sql(sql_quiery, conn, index_col="Date of Termination")
dfg_term_over_dt_termination
| Years Employed | Min Years Employed | Max Years Employed | |
|---|---|---|---|
| Date of Termination | |||
| 2010-07-30 | 1.6 | 1.6 | 1.6 |
| 2010-08-30 | 3.2 | 3.2 | 3.2 |
| 2011-01-12 | 0.0 | 0.0 | 0.0 |
| 2011-05-14 | 0.3 | 0.3 | 0.3 |
| 2011-05-15 | 0.3 | 0.3 | 0.3 |
| 2011-05-30 | 1.1 | 1.1 | 1.1 |
| 2011-06-04 | 0.1 | 0.1 | 0.1 |
| 2011-08-04 | 0.5 | 0.5 | 0.5 |
| 2011-08-19 | 0.1 | 0.1 | 0.1 |
| 2011-09-05 | 0.2 | 0.2 | 0.2 |
| 2011-09-06 | 0.2 | 0.2 | 0.2 |
| 2011-09-15 | 0.2 | 0.2 | 0.2 |
| 2011-09-26 | 0.2 | 0.2 | 1.1 |
| 2011-09-26 | 1.1 | 0.2 | 1.1 |
| 2011-10-22 | 0.1 | 0.1 | 0.1 |
| 2011-11-15 | 0.0 | 0.0 | 0.0 |
| 2012-01-02 | 0.3 | 0.3 | 0.3 |
| 2012-01-09 | 0.3 | 0.3 | 0.8 |
| 2012-01-09 | 0.8 | 0.3 | 0.8 |
| 2012-02-04 | 0.7 | 0.7 | 0.7 |
| 2012-02-08 | 0.6 | 0.6 | 0.6 |
| 2012-04-07 | 0.7 | 0.5 | 0.7 |
| 2012-04-07 | 0.5 | 0.5 | 0.7 |
| 2012-07-02 | 1.5 | 1.5 | 1.5 |
| 2012-07-08 | 1.2 | 1.2 | 1.2 |
| 2012-08-13 | 1.4 | 1.4 | 1.4 |
| 2012-09-19 | 0.5 | 0.5 | 0.5 |
| 2012-09-23 | 1.2 | 1.2 | 1.2 |
| 2012-09-24 | 1.2 | 1.2 | 1.6 |
| 2012-09-24 | 1.6 | 1.2 | 1.6 |
| 2012-09-26 | 0.0 | 0.0 | 0.0 |
| 2012-11-30 | 1.4 | 1.4 | 1.4 |
| 2012-12-28 | 2.0 | 2.0 | 2.0 |
| 2013-01-07 | 1.7 | 1.7 | 1.7 |
| 2013-02-18 | 2.1 | 2.1 | 2.1 |
| 2013-04-01 | 4.0 | 2.1 | 4.0 |
| 2013-04-01 | 2.1 | 2.1 | 4.0 |
| 2013-04-06 | 0.5 | 0.5 | 0.5 |
| 2013-04-15 | 1.1 | 1.1 | 1.1 |
| 2013-06-05 | 2.1 | 2.1 | 2.1 |
| 2013-06-06 | 2.2 | 2.2 | 2.2 |
| 2013-06-15 | 1.2 | 1.2 | 1.2 |
| 2013-06-18 | 0.7 | 0.7 | 0.7 |
| 2013-06-24 | 2.1 | 2.1 | 2.1 |
| 2013-08-19 | 1.3 | 1.3 | 1.9 |
| 2013-08-19 | 1.9 | 1.3 | 1.9 |
| 2013-09-15 | 0.2 | 0.2 | 0.2 |
| 2013-09-25 | 2.0 | 2.0 | 2.0 |
| 2014-01-11 | 2.9 | 2.9 | 2.9 |
| 2014-01-12 | 3.0 | 3.0 | 3.0 |
| 2014-02-25 | 0.0 | 0.0 | 0.0 |
| 2014-03-31 | 1.2 | 1.2 | 1.2 |
| 2014-04-04 | 3.3 | 2.6 | 3.3 |
| 2014-04-04 | 2.6 | 2.6 | 3.3 |
| 2014-04-15 | 1.2 | 1.2 | 1.2 |
| 2014-04-24 | 2.5 | 2.5 | 2.5 |
| 2014-05-17 | 2.6 | 2.6 | 2.6 |
| 2014-08-02 | 3.0 | 3.0 | 3.0 |
| 2014-08-07 | 0.9 | 0.9 | 0.9 |
| 2014-09-04 | 3.1 | 3.1 | 3.1 |
| 2014-09-27 | 2.5 | 2.5 | 2.5 |
| 2014-10-31 | 3.7 | 3.7 | 3.7 |
| 2015-02-22 | 0.0 | 0.0 | 0.0 |
| 2015-03-15 | 0.1 | 0.1 | 0.1 |
| 2015-04-08 | 5.5 | 5.5 | 5.5 |
| 2015-05-12 | 0.4 | 0.4 | 0.4 |
| 2015-06-04 | 3.7 | 3.7 | 3.7 |
| 2015-06-08 | 4.1 | 4.1 | 4.1 |
| 2015-06-25 | 3.3 | 3.3 | 3.3 |
| 2015-06-27 | 3.5 | 3.5 | 3.5 |
| 2015-06-29 | 2.2 | 2.2 | 2.2 |
| 2015-08-15 | 4.5 | 4.5 | 4.5 |
| 2015-09-01 | 3.1 | 3.1 | 3.1 |
| 2015-09-05 | 1.2 | 1.2 | 1.2 |
| 2015-09-07 | 3.9 | 3.9 | 4.4 |
| 2015-09-07 | 4.4 | 3.9 | 4.4 |
| 2015-09-12 | 1.2 | 1.2 | 1.2 |
| 2015-09-29 | 7.2 | 7.2 | 7.2 |
| 2015-10-25 | 4.5 | 4.5 | 4.5 |
| 2015-10-31 | 0.8 | 0.8 | 0.8 |
| 2015-11-04 | 3.9 | 3.9 | 4.7 |
| 2015-11-04 | 4.7 | 3.9 | 4.7 |
| 2015-11-10 | 0.9 | 0.9 | 0.9 |
| 2015-11-11 | 3.7 | 3.7 | 3.7 |
| 2015-11-14 | 4.1 | 4.1 | 4.1 |
| 2015-11-15 | 4.5 | 4.5 | 4.5 |
| 2015-12-12 | 5.0 | 5.0 | 5.0 |
| 2015-12-15 | 4.0 | 4.0 | 4.0 |
| 2016-01-15 | 4.7 | 4.7 | 4.7 |
| 2016-01-26 | 5.1 | 5.1 | 5.1 |
| 2016-02-05 | 3.5 | 3.5 | 3.5 |
| 2016-02-08 | 4.4 | 4.4 | 4.4 |
| 2016-02-19 | 2.0 | 2.0 | 2.0 |
| 2016-02-21 | 3.2 | 3.2 | 3.2 |
| 2016-04-01 | 5.3 | 5.3 | 5.3 |
| 2016-04-29 | 4.5 | 4.5 | 4.5 |
| 2016-05-01 | 2.1 | 1.4 | 2.1 |
| 2016-05-01 | 1.4 | 1.4 | 2.1 |
| 2016-05-17 | 5.4 | 5.4 | 5.4 |
| 2016-05-18 | 5.6 | 5.6 | 5.6 |
| 2016-05-25 | 3.2 | 3.2 | 3.2 |
| 2016-06-16 | 1.2 | 1.2 | 1.2 |
# Создадим график зависимости срока работы в компании от месяца и года увольнения
g=sns.relplot(data=dfg_term_over_dt_termination,
height=5,
aspect=2.5,
palette="tab10_r",
alpha=1,
kind="line",
marker='o'
)
# Определим шкалу как временной диапазон в формате Pandas с квартальными интервалами (для читаемости):
scale_span = pd.date_range(start=dfg_term_over_dt_termination.index.min(),
end= dfg_term_over_dt_termination.index.max(),
freq='QS').tolist()
plt.xticks(ticks=scale_span, fontsize=12, rotation=90) # установим шкалу X и размер её обозначений
plt.yticks(fontsize=14) # Установим размер обозначения для шкалы Y
# Определим формат подписей шкалы как YYYY-MM
g.axes[0,0].xaxis.set_major_formatter(mpl.dates.DateFormatter('%Y-%m'))
g.set_xlabels("Month of hire", fontsize=14) # Размер подписей шкалы X
g.set_ylabels("Years Employed", fontsize=14) # Размер подписей шкалы Y
# Обозначим отдельным цветом "критичные" части графика
g.axes[0,0].axvspan("2015-03", "2016-07", facecolor="deeppink", alpha=0.25)
def delta_unix(y, m, d): # функция вычисления количества дней с начала эпохи UNIX
n = dt.datetime(y, m, d) - dt.datetime(1970, 1, 1)
return n.days
# Для временной шкалы X координаты в формате float определяются как число дней с 01-01-1970
g.axes[0,0].axline((delta_unix(2011,1,1), 0),
(delta_unix(2016,6,1), 5.5),
color='crimson')
g.axes[0,0].axline((delta_unix(2015,3,1), 0),
(delta_unix(2016,7,1), 1.2),
color='midnightblue',
linestyle='-.')
# Заголовок графика:
g.fig.suptitle("Зависимость срока работы в компании от месяца увольнения", fontsize=16, x=0.45, y=1.025)
plt.show()
ВЫВОД
В дополнение к выводам из предыдущего графика, можно отметить следующее.
# Создадим DF для исследования зависимости сроков работы в компаниии от удельных затрат на наём
sql_quiery = \
"""
WITH
employee_selection AS
(SELECT
--"Employee Number",
(ROUND(AVG("Days Employed"::numeric) / 360, 1)) AS avg_years_employed,
"Employee Source" AS empl_source
FROM
hr_dataset
GROUP BY
"Employee Source"
),
employee_count_per_source AS
(SELECT
COUNT("Employee Number") AS empl_count,
"Employee Source" AS empl_source
FROM
hr_dataset
GROUP BY
"Employee Source"
),
empl_source_price AS
(SELECT
"Employment Source" AS empl_source,
"Total"
FROM
recruiting_costs
),
empl_price_and_term AS
(SELECT
empl_source,
ROUND("Total"::numeric/empl_count, 2) AS avg_employee_price,
avg_years_employed,
empl_count
FROM
(employee_count_per_source
LEFT JOIN
empl_source_price
USING (empl_source)
)
LEFT JOIN
employee_selection
USING (empl_source)
)
SELECT
CONCAT(empl_source, ': $', avg_employee_price) AS empl_source,
avg_employee_price,
avg_years_employed,
empl_count
FROM
empl_price_and_term
ORDER BY
avg_employee_price DESC
;
"""
dfg_term_over_source_price = pd.read_sql(sql_quiery, conn).dropna()
dfg_term_over_source_price
| empl_source | avg_employee_price | avg_years_employed | empl_count | |
|---|---|---|---|---|
| 1 | Careerbuilder: $7790.00 | 7790.00 | 6.7 | 1 |
| 2 | Pay Per Click: $1323.00 | 1323.00 | 0.0 | 1 |
| 3 | MBTA ads: $645.88 | 645.88 | 4.4 | 17 |
| 4 | On-campus Recruiting: $625.00 | 625.00 | 3.4 | 12 |
| 5 | Website Banner Ads: $549.46 | 549.46 | 4.1 | 13 |
| 6 | Social Networks - Facebook Twitter etc: $506.64 | 506.64 | 3.0 | 11 |
| 7 | Newspager/Magazine: $460.61 | 460.61 | 3.3 | 18 |
| 8 | Other: $443.89 | 443.89 | 5.3 | 9 |
| 9 | Billboard: $387.00 | 387.00 | 4.7 | 16 |
| 10 | Diversity Job Fair: $345.55 | 345.55 | 3.7 | 29 |
| 11 | Monster.com: $240.00 | 240.00 | 3.9 | 24 |
| 12 | Search Engine - Google Bing Yahoo: $207.32 | 207.32 | 3.7 | 25 |
| 13 | Pay Per Click - Google: $167.10 | 167.10 | 3.6 | 21 |
| 14 | Professional Society: $60.00 | 60.00 | 3.9 | 20 |
| 15 | Company Intranet - Partner: $0.00 | 0.00 | 1.2 | 1 |
| 16 | Information Session: $0.00 | 0.00 | 4.3 | 4 |
| 17 | Internet Search: $0.00 | 0.00 | 3.5 | 6 |
| 18 | Employee Referral: $0.00 | 0.00 | 3.5 | 31 |
| 19 | On-line Web application: $0.00 | 0.00 | 0.5 | 1 |
| 20 | Vendor Referral: $0.00 | 0.00 | 2.8 | 15 |
| 21 | Word of Mouth: $0.00 | 0.00 | 2.8 | 13 |
| 22 | Glassdoor: $0.00 | 0.00 | 3.2 | 14 |
# Построим график соответствия средних сроков работы в компании и удельной цены привлечения одного сотрудника
g=sns.relplot(
data=dfg_term_over_source_price[dfg_term_over_source_price.avg_employee_price <= 2000],
x="avg_years_employed",
y="avg_employee_price",
hue="empl_source",
size="empl_count",
height=6,
aspect=1.75,
palette="tab20",
legend='auto',
)
# Заголовок графика:
g.fig.suptitle(f"Соотвествие дат средних сроков работы в компании и удельной цены привелчения одного сотрудника \n"
f"(кроме цены свыше $2000)",
fontsize=16, x=0.5, y=1.075)
plt.show()
Для построения графика выше пришлось исключить "аномальное" значение стоимости привлечения сотрудника в 7790 долларов США. Можно было бы использовать логарифмическую шкалу, но тогда мы бы "потеряли" нулевые значения стоимости привлечения.
ВЫВОД
sql_quiery = \
"""
SELECT
department,
age AS "Age in years",
"Days Employed"
FROM
hr_dataset
WHERE -- Условие, что работники действующие
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
ORDER BY
department
"""
dfg_age_term_dependance = pd.read_sql(sql_quiery, conn)
# dfg_age_term_dependance
# Выведем график зависимости возраста и срока работы
g=sns.relplot(x="Age in years",
y="Days Employed",
hue="department",
alpha=1.0,
palette="Set1",
height=6,
data=dfg_age_term_dependance)
g.fig.suptitle("Зависимость срока работы в компании от возраста сотрудников в разрезе подразделений", fontsize=16, y=1.05)
# Определим размер подписи оси X
g.set_xlabels(fontsize=9)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=9)
plt.show()
sql_quiery = \
"""
SELECT
department,
age AS "Age in years",
"Days Employed"
FROM
hr_dataset
WHERE -- Условие, что работники действующие
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
ORDER BY
department
"""
dfg_age_term_dependance = pd.read_sql(sql_quiery, conn)
# dfg_age_term_dependance
# Выведем график зависимости возраста и срока работы
g=sns.lmplot(x="Age in years",
y="Days Employed",
hue="department",
palette="Set1",
height=6,
data=dfg_age_term_dependance,
facet_kws=dict(ylim=(0,4500)))
g.fig.suptitle("Зависимость срока работы в компании от возраста сотрудников по подразделениям (линейная регрессия)",
fontsize=16, y=1.05)
# Определим размер подписи оси X
g.set_xlabels(fontsize=9)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=9)
plt.show()
ВЫВОД
Явной зависимости между возрастом и сроком работы в компании не обнаруживается. (Для Admin Offices количество данных очень мало, для этого же подразделения и для Software Engineering слишком большой доверительный интервал, наклон линии для IT/IS очень незначительный, для остальных подразделений линия графика параллельна оси X).
sql_quiery = \
"""
SELECT
ROUND("Days Employed"::numeric / 360, 1) AS "Years Employed",
sex,
COUNT("Employee Number") AS empl_count
FROM
hr_dataset
WHERE -- Условие, что работники действующие
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
GROUP BY
ROUND("Days Employed"::numeric / 360, 1),
sex
ORDER BY
ROUND("Days Employed"::numeric / 360, 1) DESC
;
"""
dfg_term_over_sex = pd.read_sql(sql_quiery, conn)
#dfg_term_over_sex
# Выведем график зависимости возраста и срока работы
g=sns.lmplot(x="Years Employed",
y="empl_count",
hue="sex",
palette="seismic",
height=6,
data=dfg_term_over_sex,
)
g.fig.suptitle("Зависимость срока работы в компании от пола для действующих сотрудников (линейная регрессия)",
fontsize=16, y=1.05)
# Определим размер подписи оси X
g.set_xlabels(fontsize=9)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=9)
plt.show()
sql_quiery = \
"""
SELECT
ROUND("Days Employed"::numeric / 360, 1) AS "Years Employed",
sex,
"Employee Number"
FROM
hr_dataset
WHERE -- Условие, что работники действующие
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
GROUP BY
"Employee Number",
"Days Employed",
sex
ORDER BY
"Days Employed"
;
"""
dfg_term_over_sex1 = pd.read_sql(sql_quiery, conn)
#dfg_term_over_sex1
# Выведем график зависимости пола и срока работы
g=sns.catplot(x="sex",
y="Years Employed",
hue="sex",
palette="seismic_r",
height=6,
data=dfg_term_over_sex1,
saturation=0.5,
kind='boxen',
dodge=False,
k_depth='proportion'
)
# Определим заголовок
g.fig.suptitle("Зависимость срока работы в компании от пола для действующих сотрудников (распределение)",
fontsize=16, y=1.05)
# Определим размер подписи оси X
g.set_xlabels(fontsize=16)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=16)
plt.show()
ВЫВОД
Компания является преимущественно "женской". Взаимозависимость срока работы в компании от пола проявляется в следующем.
sql_quiery = \
"""
SELECT
ROUND("Days Employed"::numeric / 360, 1) AS "Years Employed",
racedesc,
COUNT("Employee Number") AS empl_count
FROM
hr_dataset
WHERE -- Условие, что работники действующие
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
GROUP BY
ROUND("Days Employed"::numeric / 360, 1),
racedesc
ORDER BY
ROUND("Days Employed"::numeric / 360, 1) DESC
;
"""
dfg_term_over_race = pd.read_sql(sql_quiery, conn)
#dfg_term_over_race
# Выведем график зависимости расово-этнической принедлжености и срока работы
g=sns.lmplot(x="Years Employed",
y="empl_count",
hue="racedesc",
palette="Dark2",
height=6,
aspect=1.25,
data=dfg_term_over_race,
)
g.fig.suptitle(f"Зависимость срока работы в компании от расово-этнической принадлежности \n"
f"для действующих сотрудников (линейная регрессия)",
fontsize=16, y=1.15)
# Определим размер подписи оси X
g.set_xlabels(fontsize=9)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=9)
plt.show()
sql_quiery = \
"""
SELECT
ROUND("Days Employed"::numeric / 360, 1) AS "Years Employed",
racedesc,
"Employee Number"
FROM
hr_dataset
WHERE -- Условие, что работники действующие
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
GROUP BY
"Employee Number",
"Days Employed",
racedesc
ORDER BY
"Days Employed"
;
"""
dfg_term_over_race1 = pd.read_sql(sql_quiery, conn)
#dfg_term_over_race1
# Выведем график зависимости расово-этнической принадлежности и срока работы
g=sns.catplot(x="racedesc",
y="Years Employed",
hue="racedesc",
palette="Dark2",
height=6,
aspect=1.25,
data=dfg_term_over_race1,
saturation=0.5,
kind='boxen',
dodge=False,
k_depth='proportion'
)
# Определим заголовок
g.fig.suptitle(f"Зависимость срока работы в компании от расово-этнической принадлежности \n"
f"для действующих сотрудников (распределение)",
fontsize=16, y=1.15)
# Определим размер подписи оси X
g.set_xlabels(fontsize=16)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=16)
plt.xticks(fontsize=10, rotation=45) # установим для шкалы X положение и размер её обозначений
plt.show()
ВЫВОД
sql_quiery = \
"""
SELECT
ROUND("Days Employed"::numeric / 360, 1) AS "Years Employed",
maritaldesc,
COUNT("Employee Number") AS empl_count
FROM
hr_dataset
WHERE -- Условие, что работники действующие
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
GROUP BY
ROUND("Days Employed"::numeric / 360, 1),
maritaldesc
ORDER BY
ROUND("Days Employed"::numeric / 360, 1) DESC
;
"""
dfg_term_over_marital = pd.read_sql(sql_quiery, conn)
#dfg_term_over_marital
# Выведем график зависимости семейного положения и срока работы
g=sns.lmplot(x="Years Employed",
y="empl_count",
hue="maritaldesc",
palette="tab10",
height=6,
aspect=1.25,
data=dfg_term_over_marital,
)
g.fig.suptitle(f"Зависимость срока работы в компании от семейного положения \n"
f"для действующих сотрудников (линейная регрессия)",
fontsize=16, y=1.15)
# Определим размер подписи оси X
g.set_xlabels(fontsize=9)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=9)
plt.show()
sql_quiery = \
"""
SELECT
ROUND("Days Employed"::numeric / 360, 1) AS "Years Employed",
maritaldesc,
"Employee Number"
FROM
hr_dataset
WHERE -- Условие, что работники действующие
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
GROUP BY
"Employee Number",
"Days Employed",
maritaldesc
ORDER BY
"Days Employed"
;
"""
dfg_term_over_marital1 = pd.read_sql(sql_quiery, conn)
#dfg_term_over_marital1
# Выведем график зависимости семейного положения и срока работы
g=sns.catplot(x="maritaldesc",
y="Years Employed",
hue="maritaldesc",
palette="tab10",
height=6,
aspect=1.25,
data=dfg_term_over_marital1,
saturation=0.5,
kind='boxen',
dodge=False,
k_depth='proportion'
)
# Определим заголовок
g.fig.suptitle(f"Зависимость срока работы в компании от семейного положения \n"
f"для действующих сотрудников (распределение)",
fontsize=16, y=1.15)
# Определим размер подписи оси X
g.set_xlabels(fontsize=16)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=16)
plt.xticks(fontsize=12, rotation=0) # установим для шкалы X положение и размер её обозначений
plt.show()
ВЫВОД
sql_quiery = \
"""
-- Создадим запрос и датафрейм со статусами занятости сотрудников и оценками производительности.
-- Используем его для построения графика
WITH StatusPerScore AS
(SELECT
COUNT("Employee Number") AS empl_count,
SUM(COUNT("Employee Number")) OVER (PARTITION BY "Performance Score") AS employees_per_score,
ROUND(
COUNT("Employee Number") / (SUM(COUNT("Employee Number")) OVER (PARTITION BY "Performance Score"))*100, 2)
AS percent,
"Employment Status",
"Performance Score"
FROM
hr_dataset
GROUP BY
"Employment Status",
"Performance Score"
ORDER BY
"Employment Status",
"Performance Score"
)
SELECT
"Performance Score",
SUM(
CASE WHEN "Employment Status" = 'Future Start'
THEN percent
END)
AS "Future Start",
SUM(
CASE WHEN "Employment Status" = 'Active'
THEN percent
END)
AS "Active",
SUM(
CASE WHEN "Employment Status" = 'Leave of Absence'
THEN percent
END)
AS "Leave of Absence",
SUM(
CASE WHEN "Employment Status" = 'Voluntarily Terminated'
THEN percent
END)
AS "Voluntarily Terminated",
SUM(
CASE WHEN "Employment Status" = 'Terminated for Cause'
THEN percent
END)
AS "Terminated for Cause",
SUM(empl_count) AS "Employees_Count"
FROM
StatusPerScore
GROUP BY
"Performance Score"
ORDER BY
"Performance Score"
"""
dfg_StatusPerScore = pd.read_sql(sql_quiery, conn, index_col="Performance Score").fillna(0)
dfg_StatusPerScore
| Future Start | Active | Leave of Absence | Voluntarily Terminated | Terminated for Cause | Employees_Count | |
|---|---|---|---|---|---|---|
| Performance Score | ||||||
| 90-day meets | 0.00 | 51.61 | 6.45 | 38.71 | 3.23 | 31.0 |
| Exceeds | 7.14 | 64.29 | 0.00 | 25.00 | 3.57 | 28.0 |
| Exceptional | 0.00 | 100.00 | 0.00 | 0.00 | 0.00 | 9.0 |
| Fully Meets | 0.00 | 62.43 | 6.63 | 28.18 | 2.76 | 181.0 |
| N/A- too early to review | 24.32 | 40.54 | 0.00 | 29.73 | 5.41 | 37.0 |
| Needs Improvement | 0.00 | 46.67 | 0.00 | 26.67 | 26.67 | 15.0 |
| PIP | 0.00 | 55.56 | 0.00 | 33.33 | 11.11 | 9.0 |
data = dfg_StatusPerScore.iloc[:,0:5] # Определим данные для графика
category_names = data.columns.tolist() # Определим список категорий грфика
labels = data.index.tolist() # Определим список заголовков столбцов графика
data_cum = data.cumsum(axis=1) # создадим ДФ с наращеным итогом данных (в данном случае - процентов)
# Создадим массив цветов в формате RGBA из цветовой гаммы сообразно категориям графика:
# data.shape[1] - размерность массива наших данных (определены в data= выше) по горизонтали
# np.linspace - равномерно распределяет данные в количестве data.shape[1] в диапазоне от 0,2 до 0.8
# plt.colomaps[palette].array - выделяет RGBA значения в линейке палитры сообразно значениям np.linspace
category_colors = plt.colormaps['cividis'](np.linspace(0.2, 0.8, data.shape[1]))
# Строим график
fig, ax = plt.subplots(figsize=(15,5)) # определяем поле графика
ax.invert_yaxis() # инвертируем порядок вывода категорий
# ax.xaxis.set_visible(False) # ну будем делать невидимыми метки по оси Х
ax.set_xlim(0, data.sum(axis=1).max()) # Для универсальности установим границы значений для оси X
# от 0 до максимума суммы по горизонтали. В нашем случае верхняя граница всегда = 100
# Выведем график в цикле последовательной прорисовка каждой категории
# столбец графика и цвет по спискам категорий и цветов
for i, (colname, color) in enumerate(zip(category_names, category_colors)):
widths = data.iloc[:, i] # длина столбца данной категории (значения все строк и данного стоблца в data)
starts = data_cum.iloc[:, i] - widths # начало отрисовки столбца данной категории аналогично от суммарных значений
rects = ax.barh(labels, widths, left=starts, height=0.8,
label=colname, color=color) # определяем прямоугольники графика для каждой категории
# определяем зависимость цвета подписей на графике от значений RGB:
r, g, b, _ = color
text_color = 'white' if r * g * b < 0.5 else 'black'
# Определяем подписи на прямоугольниках графика
ax.bar_label(rects, label_type='center', color=text_color)
# Определяем легенду
ax.legend(ncol=len(category_names), bbox_to_anchor=(0.125, 1),
loc='lower left', fontsize='small')
# Определяем заголовок
ax.set_title('Зависимость статусов занятости сотрудников от оценки их производительности, %', pad=40)
ax.title.set_size(14)
plt.show()
ВЫВОД
sql_quiery = \
"""
-- Создадим временное представление, запрос и датафрейм с оценкой производительности и подразделением,
-- Используем его для построения графиков
CREATE OR REPLACE TEMPORARY VIEW
PerformancePerDepartment AS
SELECT
COUNT("Employee Number") AS empl_count,
SUM(COUNT("Employee Number")) OVER (PARTITION BY department) AS employees_per_chief,
ROUND(
COUNT("Employee Number") / (SUM(COUNT("Employee Number")) OVER (PARTITION BY department))*100, 2) AS percent,
"Performance Score",
department
FROM
hr_dataset
GROUP BY
department,
"Performance Score"
ORDER BY
department,
"Performance Score"
;
SELECT * FROM PerformancePerDepartment
;
"""
dfg_PerformancePerDepartment1 = pd.read_sql(sql_quiery, conn)
#dfg_PerformancePerDepartment1
# Создадим сетку графиков зависимости оценки произаводительности от подразделения
g=sns.catplot(
kind="bar",
x="department",
y="percent",
hue="department",
data=dfg_PerformancePerDepartment1,
col_order=['N/A- too early to review', '90-day meets',
'Fully Meets', 'Exceptional', 'Exceeds', 'Needs Improvement', 'PIP'],
col="Performance Score",
col_wrap=4,
height=4,
aspect = 1,
palette="Set2",
margin_titles=True,
sharex=False,
sharey=False,
dodge=False
)
g.fig.suptitle(
"Зависимость оценки производительности сотрудников от подразделения",
fontsize=16, x=0.50, y=1.03)
# Определим размер подписи оси X
g.set_xlabels('Подразделение', fontsize=12)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=12)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=75,fontsize=10)
# Изменим интервал между индивидуальными графиками
g.fig.subplots_adjust(hspace=0.8)
plt.show()
Выведем те же данные в другой форме
sql_quiery = \
"""
-- Создадим запрос и датафрейм с оценкой производительности и подразделением,
-- Используем временное представление, полученное ранее
-- Используем его для построения графика
SELECT
department,
SUM(
CASE WHEN "Performance Score" = 'N/A- too early to review'
THEN percent
END)
AS "N/A- too early to review",
SUM(
CASE WHEN "Performance Score" = '90-day meets'
THEN percent
END)
AS "90-day meets",
SUM(
CASE WHEN "Performance Score" = 'Fully Meets'
THEN percent
END)
AS "Fully Meets",
SUM(
CASE WHEN "Performance Score" = 'Exceptional'
THEN percent
END)
AS "Exceptional",
SUM(
CASE WHEN "Performance Score" = 'Exceeds'
THEN percent
END)
AS "Exceeds",
SUM(
CASE WHEN "Performance Score" = 'Needs Improvement'
THEN percent
END)
AS "Needs Improvement",
SUM(
CASE WHEN "Performance Score" = 'PIP'
THEN percent
END)
AS "PIP"
FROM
PerformancePerDepartment
GROUP BY
department
ORDER BY
department
;
"""
dfg_PerformancePerDepartment2 = pd.read_sql(sql_quiery, conn, index_col="department").fillna(0)
dfg_PerformancePerDepartment2
| N/A- too early to review | 90-day meets | Fully Meets | Exceptional | Exceeds | Needs Improvement | PIP | |
|---|---|---|---|---|---|---|---|
| department | |||||||
| Admin Offices | 20.00 | 10.00 | 70.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| Executive Office | 0.00 | 0.00 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| IT/IS | 18.00 | 16.00 | 52.00 | 8.00 | 4.00 | 2.00 | 0.00 |
| Production | 12.02 | 8.65 | 57.69 | 1.92 | 11.06 | 5.29 | 3.37 |
| Sales | 3.23 | 6.45 | 70.97 | 0.00 | 6.45 | 6.45 | 6.45 |
| Software Engineering | 0.00 | 20.00 | 50.00 | 10.00 | 10.00 | 10.00 | 0.00 |
data = dfg_PerformancePerDepartment2 # Определим данные для графика
category_names = data.columns.tolist() # Определим список категорий грфика
labels = data.index.tolist() # Определим список заголовков столбцов графика
data_cum = data.cumsum(axis=1) # создадим ДФ с наращеным итогом данных (в данном случае - процентов)
# Создадим массив цветов в формате RGBA из цветовой гаммы сообразно категориям графика:
# data.shape[1] - размерность массива наших данных по горизонтали
# np.linspace - равномерно распределяет данные в количестве data.shape[1] в диапазоне от 0 до 1
# plt.colomaps[palette].array - выделяет RGBA значения в линейке палитры сообразно значениям np.linspace
category_colors = plt.colormaps['YlGnBu'](np.linspace(0, 1, data.shape[1]))
# Строим график
fig, ax = plt.subplots(figsize=(15,5)) # определяем поле графика
ax.invert_yaxis() # инвертируем порядок вывода категорий
# ax.xaxis.set_visible(False) # ну будем делать невидимыми метки по оси Х
ax.set_xlim(0, data.sum(axis=1).max()) # Для универсальности установим границы значений для оси X
# от 0 до максимума суммы по горизонтали. В нашем случае верхняя граница всегда = 100
# Выведем график в цикле последовательной прорисовка каждой категории
# столбец графика и цвет по спискам категорий и цветов
for i, (colname, color) in enumerate(zip(category_names, category_colors)):
widths = data.iloc[:, i] # длина столбца данной категории (значения все строк и данного стоблца в data)
starts = data_cum.iloc[:, i] - widths # начало отрисовки столбца данной категории аналогично от суммарных значений
rects = ax.barh(labels, widths, left=starts, height=0.8,
label=colname, color=color) # определяем прямоугольники графика для каждой категории
# определяем зависимость цвета подписей на графике от значений RGB:
r, g, b, _ = color
text_color = 'white' if r * g * b < 0.5 else 'black'
# Определяем подписи на прямоугольниках графика
ax.bar_label(rects, label_type='center', color=text_color)
# Определяем легенду
ax.legend(ncol=len(category_names), bbox_to_anchor=(0.05, 1),
loc='lower left', fontsize='small')
# Определяем заголовок
ax.set_title('Зависимость оценки производительности от подразделений, %', pad=40)
ax.title.set_size(14)
plt.show()
ВЫВОД
sql_quiery = \
"""
-- Создадим временное представление, запрос и датафрейм с оценкой производительности и персоналии начальника,
-- Используем его для построения сетки графиков
CREATE OR REPLACE TEMPORARY VIEW
PerformancePerChief AS
WITH
chiefs AS (
SELECT
"Employee Number",
"Manager Name",
"Performance Score",
CONCAT ("Manager Name", ' (' ,department, ')' ) AS chief
FROM
hr_dataset
)
SELECT
COUNT("Employee Number") AS empl_count,
SUM(COUNT("Employee Number")) OVER (PARTITION BY chief) AS employees_per_chief,
ROUND(
COUNT("Employee Number") / (SUM(COUNT("Employee Number")) OVER (PARTITION BY chief))*100, 2) AS percent,
"Performance Score",
chief
FROM
chiefs
GROUP BY
chief,
"Performance Score"
ORDER BY
chief,
"Performance Score"
;
SELECT * FROM PerformancePerChief
;
"""
df_PerformancePerChief = pd.read_sql(sql_quiery, conn)
df_PerformancePerChief
| empl_count | employees_per_chief | percent | Performance Score | chief | |
|---|---|---|---|---|---|
| 0 | 2 | 9.0 | 22.22 | 90-day meets | Alex Sweetwater (Software Engineering) |
| 1 | 1 | 9.0 | 11.11 | Exceeds | Alex Sweetwater (Software Engineering) |
| 2 | 1 | 9.0 | 11.11 | Exceptional | Alex Sweetwater (Software Engineering) |
| 3 | 4 | 9.0 | 44.44 | Fully Meets | Alex Sweetwater (Software Engineering) |
| 4 | 1 | 9.0 | 11.11 | Needs Improvement | Alex Sweetwater (Software Engineering) |
| 5 | 1 | 21.0 | 4.76 | 90-day meets | Amy Dunn (Production ) |
| 6 | 2 | 21.0 | 9.52 | Exceeds | Amy Dunn (Production ) |
| 7 | 1 | 21.0 | 4.76 | Exceptional | Amy Dunn (Production ) |
| 8 | 11 | 21.0 | 52.38 | Fully Meets | Amy Dunn (Production ) |
| 9 | 5 | 21.0 | 23.81 | N/A- too early to review | Amy Dunn (Production ) |
| 10 | 1 | 21.0 | 4.76 | PIP | Amy Dunn (Production ) |
| 11 | 1 | 1.0 | 100.00 | Fully Meets | Board of Directors (Admin Offices) |
| 12 | 1 | 1.0 | 100.00 | Fully Meets | Board of Directors (Executive Office) |
| 13 | 1 | 7.0 | 14.29 | 90-day meets | Brandon R. LeBlanc (Admin Offices) |
| 14 | 4 | 7.0 | 57.14 | Fully Meets | Brandon R. LeBlanc (Admin Offices) |
| 15 | 2 | 7.0 | 28.57 | N/A- too early to review | Brandon R. LeBlanc (Admin Offices) |
| 16 | 1 | 21.0 | 4.76 | 90-day meets | Brannon Miller (Production ) |
| 17 | 5 | 21.0 | 23.81 | Exceeds | Brannon Miller (Production ) |
| 18 | 2 | 21.0 | 9.52 | Exceptional | Brannon Miller (Production ) |
| 19 | 8 | 21.0 | 38.10 | Fully Meets | Brannon Miller (Production ) |
| 20 | 1 | 21.0 | 4.76 | Needs Improvement | Brannon Miller (Production ) |
| 21 | 4 | 21.0 | 19.05 | PIP | Brannon Miller (Production ) |
| 22 | 8 | 8.0 | 100.00 | Fully Meets | Brian Champaigne (IT/IS) |
| 23 | 1 | 21.0 | 4.76 | 90-day meets | David Stanley (Production ) |
| 24 | 1 | 21.0 | 4.76 | Exceeds | David Stanley (Production ) |
| 25 | 15 | 21.0 | 71.43 | Fully Meets | David Stanley (Production ) |
| 26 | 4 | 21.0 | 19.05 | N/A- too early to review | David Stanley (Production ) |
| 27 | 2 | 3.0 | 66.67 | Fully Meets | Debra Houlihan (Sales) |
| 28 | 1 | 3.0 | 33.33 | Needs Improvement | Debra Houlihan (Sales) |
| 29 | 3 | 22.0 | 13.64 | 90-day meets | Elijiah Gray (Production ) |
| 30 | 2 | 22.0 | 9.09 | Exceeds | Elijiah Gray (Production ) |
| 31 | 13 | 22.0 | 59.09 | Fully Meets | Elijiah Gray (Production ) |
| 32 | 3 | 22.0 | 13.64 | N/A- too early to review | Elijiah Gray (Production ) |
| 33 | 1 | 22.0 | 4.55 | Needs Improvement | Elijiah Gray (Production ) |
| 34 | 1 | 4.0 | 25.00 | Exceeds | Eric Dougall (IT/IS) |
| 35 | 3 | 4.0 | 75.00 | Fully Meets | Eric Dougall (IT/IS) |
| 36 | 2 | 2.0 | 100.00 | Fully Meets | Janet King (Admin Offices) |
| 37 | 1 | 1.0 | 100.00 | Exceptional | Janet King (IT/IS) |
| 38 | 3 | 15.0 | 20.00 | Exceeds | Janet King (Production ) |
| 39 | 10 | 15.0 | 66.67 | Fully Meets | Janet King (Production ) |
| 40 | 2 | 15.0 | 13.33 | Needs Improvement | Janet King (Production ) |
| 41 | 1 | 1.0 | 100.00 | Fully Meets | Janet King (Sales) |
| 42 | 1 | 6.0 | 16.67 | Exceeds | Jennifer Zamora (IT/IS) |
| 43 | 1 | 6.0 | 16.67 | Exceptional | Jennifer Zamora (IT/IS) |
| 44 | 3 | 6.0 | 50.00 | Fully Meets | Jennifer Zamora (IT/IS) |
| 45 | 1 | 6.0 | 16.67 | Needs Improvement | Jennifer Zamora (IT/IS) |
| 46 | 1 | 1.0 | 100.00 | Fully Meets | Jennifer Zamora (Software Engineering) |
| 47 | 1 | 14.0 | 7.14 | 90-day meets | John Smith (Sales) |
| 48 | 11 | 14.0 | 78.57 | Fully Meets | John Smith (Sales) |
| 49 | 1 | 14.0 | 7.14 | Needs Improvement | John Smith (Sales) |
| 50 | 1 | 14.0 | 7.14 | PIP | John Smith (Sales) |
| 51 | 3 | 22.0 | 13.64 | 90-day meets | Kelley Spirea (Production ) |
| 52 | 3 | 22.0 | 13.64 | Exceeds | Kelley Spirea (Production ) |
| 53 | 15 | 22.0 | 68.18 | Fully Meets | Kelley Spirea (Production ) |
| 54 | 1 | 22.0 | 4.55 | N/A- too early to review | Kelley Spirea (Production ) |
| 55 | 2 | 21.0 | 9.52 | 90-day meets | Ketsia Liebig (Production ) |
| 56 | 2 | 21.0 | 9.52 | Exceeds | Ketsia Liebig (Production ) |
| 57 | 14 | 21.0 | 66.67 | Fully Meets | Ketsia Liebig (Production ) |
| 58 | 2 | 21.0 | 9.52 | N/A- too early to review | Ketsia Liebig (Production ) |
| 59 | 1 | 21.0 | 4.76 | Needs Improvement | Ketsia Liebig (Production ) |
| 60 | 3 | 22.0 | 13.64 | 90-day meets | Kissy Sullivan (Production ) |
| 61 | 1 | 22.0 | 4.55 | Exceeds | Kissy Sullivan (Production ) |
| 62 | 1 | 22.0 | 4.55 | Exceptional | Kissy Sullivan (Production ) |
| 63 | 14 | 22.0 | 63.64 | Fully Meets | Kissy Sullivan (Production ) |
| 64 | 1 | 22.0 | 4.55 | N/A- too early to review | Kissy Sullivan (Production ) |
| 65 | 1 | 22.0 | 4.55 | Needs Improvement | Kissy Sullivan (Production ) |
| 66 | 1 | 22.0 | 4.55 | PIP | Kissy Sullivan (Production ) |
| 67 | 1 | 13.0 | 7.69 | 90-day meets | Lynn Daneault (Sales) |
| 68 | 2 | 13.0 | 15.38 | Exceeds | Lynn Daneault (Sales) |
| 69 | 8 | 13.0 | 61.54 | Fully Meets | Lynn Daneault (Sales) |
| 70 | 1 | 13.0 | 7.69 | N/A- too early to review | Lynn Daneault (Sales) |
| 71 | 1 | 13.0 | 7.69 | PIP | Lynn Daneault (Sales) |
| 72 | 1 | 22.0 | 4.55 | 90-day meets | Michael Albert (Production ) |
| 73 | 2 | 22.0 | 9.09 | Exceeds | Michael Albert (Production ) |
| 74 | 9 | 22.0 | 40.91 | Fully Meets | Michael Albert (Production ) |
| 75 | 6 | 22.0 | 27.27 | N/A- too early to review | Michael Albert (Production ) |
| 76 | 3 | 22.0 | 13.64 | Needs Improvement | Michael Albert (Production ) |
| 77 | 1 | 22.0 | 4.55 | PIP | Michael Albert (Production ) |
| 78 | 3 | 14.0 | 21.43 | 90-day meets | Peter Monroe (IT/IS) |
| 79 | 6 | 14.0 | 42.86 | Fully Meets | Peter Monroe (IT/IS) |
| 80 | 5 | 14.0 | 35.71 | N/A- too early to review | Peter Monroe (IT/IS) |
| 81 | 5 | 17.0 | 29.41 | 90-day meets | Simon Roup (IT/IS) |
| 82 | 2 | 17.0 | 11.76 | Exceptional | Simon Roup (IT/IS) |
| 83 | 6 | 17.0 | 35.29 | Fully Meets | Simon Roup (IT/IS) |
| 84 | 4 | 17.0 | 23.53 | N/A- too early to review | Simon Roup (IT/IS) |
| 85 | 3 | 21.0 | 14.29 | 90-day meets | Webster Butler (Production ) |
| 86 | 2 | 21.0 | 9.52 | Exceeds | Webster Butler (Production ) |
| 87 | 11 | 21.0 | 52.38 | Fully Meets | Webster Butler (Production ) |
| 88 | 3 | 21.0 | 14.29 | N/A- too early to review | Webster Butler (Production ) |
| 89 | 2 | 21.0 | 9.52 | Needs Improvement | Webster Butler (Production ) |
sql_quiery = \
"""
-- Создадим запрос и датафрейм с оценкой производительности и перосналией начальника,
-- Используем временное представление, полученное ранее
-- Используем его для построения графика
SELECT
chief,
SUM(
CASE WHEN "Performance Score" = 'N/A- too early to review'
THEN percent
END)
AS "N/A- too early to review",
SUM(
CASE WHEN "Performance Score" = '90-day meets'
THEN percent
END)
AS "90-day meets",
SUM(
CASE WHEN "Performance Score" = 'Fully Meets'
THEN percent
END)
AS "Fully Meets",
SUM(
CASE WHEN "Performance Score" = 'Exceptional'
THEN percent
END)
AS "Exceptional",
SUM(
CASE WHEN "Performance Score" = 'Exceeds'
THEN percent
END)
AS "Exceeds",
SUM(
CASE WHEN "Performance Score" = 'Needs Improvement'
THEN percent
END)
AS "Needs Improvement",
SUM(
CASE WHEN "Performance Score" = 'PIP'
THEN percent
END)
AS "PIP"
FROM
PerformancePerChief
GROUP BY
chief
ORDER BY
chief
;
"""
dfg_PerformancePerChief = pd.read_sql(sql_quiery, conn, index_col="chief").fillna(0)
dfg_PerformancePerChief
| N/A- too early to review | 90-day meets | Fully Meets | Exceptional | Exceeds | Needs Improvement | PIP | |
|---|---|---|---|---|---|---|---|
| chief | |||||||
| Alex Sweetwater (Software Engineering) | 0.00 | 22.22 | 44.44 | 11.11 | 11.11 | 11.11 | 0.00 |
| Amy Dunn (Production ) | 23.81 | 4.76 | 52.38 | 4.76 | 9.52 | 0.00 | 4.76 |
| Board of Directors (Admin Offices) | 0.00 | 0.00 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| Board of Directors (Executive Office) | 0.00 | 0.00 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| Brandon R. LeBlanc (Admin Offices) | 28.57 | 14.29 | 57.14 | 0.00 | 0.00 | 0.00 | 0.00 |
| Brannon Miller (Production ) | 0.00 | 4.76 | 38.10 | 9.52 | 23.81 | 4.76 | 19.05 |
| Brian Champaigne (IT/IS) | 0.00 | 0.00 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| David Stanley (Production ) | 19.05 | 4.76 | 71.43 | 0.00 | 4.76 | 0.00 | 0.00 |
| Debra Houlihan (Sales) | 0.00 | 0.00 | 66.67 | 0.00 | 0.00 | 33.33 | 0.00 |
| Elijiah Gray (Production ) | 13.64 | 13.64 | 59.09 | 0.00 | 9.09 | 4.55 | 0.00 |
| Eric Dougall (IT/IS) | 0.00 | 0.00 | 75.00 | 0.00 | 25.00 | 0.00 | 0.00 |
| Janet King (Admin Offices) | 0.00 | 0.00 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| Janet King (IT/IS) | 0.00 | 0.00 | 0.00 | 100.00 | 0.00 | 0.00 | 0.00 |
| Janet King (Production ) | 0.00 | 0.00 | 66.67 | 0.00 | 20.00 | 13.33 | 0.00 |
| Janet King (Sales) | 0.00 | 0.00 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| Jennifer Zamora (IT/IS) | 0.00 | 0.00 | 50.00 | 16.67 | 16.67 | 16.67 | 0.00 |
| Jennifer Zamora (Software Engineering) | 0.00 | 0.00 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| John Smith (Sales) | 0.00 | 7.14 | 78.57 | 0.00 | 0.00 | 7.14 | 7.14 |
| Kelley Spirea (Production ) | 4.55 | 13.64 | 68.18 | 0.00 | 13.64 | 0.00 | 0.00 |
| Ketsia Liebig (Production ) | 9.52 | 9.52 | 66.67 | 0.00 | 9.52 | 4.76 | 0.00 |
| Kissy Sullivan (Production ) | 4.55 | 13.64 | 63.64 | 4.55 | 4.55 | 4.55 | 4.55 |
| Lynn Daneault (Sales) | 7.69 | 7.69 | 61.54 | 0.00 | 15.38 | 0.00 | 7.69 |
| Michael Albert (Production ) | 27.27 | 4.55 | 40.91 | 0.00 | 9.09 | 13.64 | 4.55 |
| Peter Monroe (IT/IS) | 35.71 | 21.43 | 42.86 | 0.00 | 0.00 | 0.00 | 0.00 |
| Simon Roup (IT/IS) | 23.53 | 29.41 | 35.29 | 11.76 | 0.00 | 0.00 | 0.00 |
| Webster Butler (Production ) | 14.29 | 14.29 | 52.38 | 0.00 | 9.52 | 9.52 | 0.00 |
data = dfg_PerformancePerChief # Определим данные для графика
category_names = data.columns.tolist() # Определим список категорий грфика
labels = data.index.tolist() # Определим список заголовков столбцов графика
data_cum = data.cumsum(axis=1) # создадим ДФ с наращеным итогом данных (в данном случае - процентов)
# Создадим массив цветов в формате RGBA из цветовой гаммы сообразно категориям графика:
# data.shape[1] - размерность массива наших данных по горизонтали
# np.linspace - равномерно распределяет данные в количестве data.shape[1] в диапазоне от 0 до 1
# plt.colomaps[palette].array - выделяет RGBA значения в линейке палитры сообразно значениям np.linspace
category_colors = plt.colormaps['PuBuGn'](np.linspace(0, 1, data.shape[1]))
# Строим график
fig, ax = plt.subplots(figsize=(15,12)) # определяем поле графика
ax.invert_yaxis() # инвертируем порядок вывода категорий
# ax.xaxis.set_visible(False) # ну будем делать невидимыми метки по оси Х
ax.set_xlim(0, data.sum(axis=1).max()) # Для универсальности установим границы значений для оси X
# от 0 до максимума суммы по горизонтали. В нашем случае верхняя граница всегда = 100
# Выведем график в цикле последовательной прорисовка каждой категории
# столбец графика и цвет по спискам категорий и цветов
for i, (colname, color) in enumerate(zip(category_names, category_colors)):
widths = data.iloc[:, i] # длина столбца данной категории (значения все строк и данного стоблца в data)
starts = data_cum.iloc[:, i] - widths # начало отрисовки столбца данной категории аналогично от суммарных значений
rects = ax.barh(labels, widths, left=starts, height=0.8,
label=colname, color=color) # определяем прямоугольники графика для каждой категории
# определяем зависимость цвета подписей на графике от значений RGB:
r, g, b, _ = color
text_color = 'white' if r * g * b < 0.5 else 'black'
# Определяем подписи на прямоугольниках графика
ax.bar_label(rects, label_type='center', color=text_color)
# Определяем легенду
ax.legend(ncol=len(category_names), bbox_to_anchor=(0.05, 1),
loc='lower left', fontsize='small')
# Определяем заголовок
ax.set_title('Зависимость оценки производительности от руководителя в разрезе подразделений, %', pad=40)
ax.title.set_size(14)
plt.show()
ВЫВОД
sql_quiery = \
"""
-- Создадим запросы и датафрейм с оценкой производительности и сроком работы сотрудников
-- Условно приведём сроки работы к годам
-- Распределим оценки производительности по колонкам
WITH
Selection AS (
SELECT
ROUND("Days Employed"::numeric / 360, 1) AS years,
"Performance Score",
"Employee Number"
FROM
hr_dataset
ORDER BY
years,
"Performance Score"
),
empl_count AS (
SELECT
years,
"Performance Score",
COUNT("Employee Number") AS employee_count
FROM
selection
GROUP BY
years,
"Performance Score"
ORDER BY
years,
"Performance Score"
),
PerformacePerYear AS (
SELECT
years,
"Performance Score",
SUM(employee_count) OVER (PARTITION BY years) AS employees_per_year_empl,
ROUND(employee_count / (SUM(employee_count) OVER (PARTITION BY years))*100, 2) AS percent
FROM
empl_count
GROUP BY
years,
employee_count,
"Performance Score"
ORDER BY
years,
"Performance Score"
),
FullyMeets AS (
SELECT
years,
percent AS "Fully Meets"
FROM PerformacePerYear
WHERE "Performance Score" = 'Fully Meets'
),
EarlyToReview AS (
SELECT
years,
percent AS "N/A- too early to review"
FROM PerformacePerYear
WHERE "Performance Score" = 'N/A- too early to review'
),
NeedsImprovement AS (
SELECT
years,
percent AS "Needs Improvement"
FROM PerformacePerYear
WHERE "Performance Score" = 'Needs Improvement'
),
NinetyDayMeets AS (
SELECT
years,
percent AS "90-day meets"
FROM PerformacePerYear
WHERE "Performance Score" = '90-day meets'
),
Exceeds AS (
SELECT
years,
percent AS "Exceeds"
FROM PerformacePerYear
WHERE "Performance Score" = 'Exceeds'
),
PIP AS (
SELECT
years,
percent AS "PIP"
FROM PerformacePerYear
WHERE "Performance Score" = 'PIP'
),
Exceptional AS (
SELECT
years,
percent AS "Exceptional"
FROM PerformacePerYear
WHERE "Performance Score" = 'Exceptional'
)
SELECT *
FROM
FullyMeets
FULL JOIN
Exceptional
USING(years)
FULL JOIN
Exceeds
USING(years)
FULL JOIN
NinetyDayMeets
USING(years)
FULL JOIN
EarlyToReview
USING(years)
FULL JOIN
NeedsImprovement
USING(years)
FULL JOIN
PIP
USING(years)
ORDER BY
years
;
"""
dfg_PerformacePerDaysEmployedPercent = pd.read_sql(sql_quiery, conn, index_col='years')
#dfg_PerformacePerDaysEmployedPercent
plot = dfg_PerformacePerDaysEmployedPercent.plot.bar(
figsize=(15,10),
fontsize=16,
stacked=True,
title='Распредееление оценок производительности сотрудников по сроку работы в компании, %',
cmap='tab10')
plot.title.set_size(18)
plot.legend(loc=1, bbox_to_anchor=(1.25, 1), fontsize=12)
plot.set(ylabel="Доля, %", xlabel="Период работы, лет")
plot.set_xticks(plot.get_xticks()[::2]) # сделаем шкалу X более редкой
plt.show()
sql_quiery = \
"""
-- Создадим запросы и датафрейм с оценкой производительности и сроком работы сотрудников
-- Условно приведём сроки работы к кварталам
-- Распределим оценки производительности по колонкам
WITH
selection AS (
SELECT
ROUND("Days Employed"::numeric / 360, 1) AS years_employed,
"Performance Score" AS performance_score,
"Employee Number"
FROM
hr_dataset
),
empl_count AS (
SELECT
years_employed,
performance_score,
COUNT("Employee Number") AS employee_count
FROM
selection
GROUP BY
years_employed,
performance_score
ORDER BY
years_employed,
performance_score
)
SELECT *
FROM empl_count
;
"""
dfg_performance_over_years_employed = pd.read_sql(sql_quiery, conn)
#dfg_performance_over_years_employed
# Построим сетку графиков для зависимости оценки производительнсоти от срока работы в компании
# Определим список для порядка вывода графиков
order_list = ['N/A- too early to review', '90-day meets', 'Fully Meets', 'Exceeds',
'Exceptional', 'Needs Improvement', 'PIP']
# Данные для каждого вида оценки разместим в отдельном графике сетки
g = sns.relplot(data=dfg_performance_over_years_employed,
x="years_employed",
y="employee_count",
col="performance_score",
col_order=order_list,
hue="performance_score",
hue_order=order_list,
kind="line",
palette="copper_r",
linewidth=4,
zorder=7,
col_wrap=2,
height=5,
aspect=1.5,
legend=False,
facet_kws=dict(sharex=False, ylim=(0,13))
)
# Для каждого графика в сетке определим дополнительные параметры
for performance_score, ax in g.axes_dict.items(): # перебираем параметры индивидуальных графиков
# Добавим заголовок каждого графика в виде аннотации, расположенной в пооле графика
ax.text(.75, .85, performance_score, transform=ax.transAxes, fontweight="bold")
# Построим "теневые" графики для других оценок производительности в поле каждого графика
sns.lineplot(data=dfg_performance_over_years_employed,
x="years_employed",
y="employee_count",
units="performance_score",
estimator=None,
color=".7",
linewidth=1,
ax=ax
)
# Уберём индивидуальные заголовки графиков
g.set_titles("")
# Установим "плотное" расположение графиков
g.tight_layout()
# Определим общий заголовок графика
g.fig.suptitle(
"Зависимость оценки производительнсоти от срока работы в компании",
fontsize=16, x=0.50, y=1.03)
plt.show()
ВЫВОД
При распределении оценок производительности сотрудников по сроку работы в компании выявляются следующие тенденции.
sql_quiery = \
"""
-- Создадим временное представление, запрос и датафрейм с оценкой производительности и источником найма,
-- Используем его для построения графиков
CREATE OR REPLACE TEMPORARY VIEW
PerformancePerSource AS
SELECT
COUNT("Employee Number") AS empl_count,
SUM(COUNT("Employee Number")) OVER (PARTITION BY "Employee Source") AS employees_per_source,
ROUND(
COUNT("Employee Number") / (SUM(COUNT("Employee Number")) OVER (PARTITION BY "Employee Source"))*100, 2) AS percent,
"Performance Score",
"Employee Source"
FROM
hr_dataset
GROUP BY
"Performance Score",
"Employee Source"
ORDER BY
"Performance Score",
"Employee Source"
;
SELECT * FROM PerformancePerSource
;
"""
dfg_PerformancePerSource1 = pd.read_sql(sql_quiery, conn)
dfg_PerformancePerSource1
| empl_count | employees_per_source | percent | Performance Score | Employee Source | |
|---|---|---|---|---|---|
| 0 | 1 | 16.0 | 6.25 | 90-day meets | Billboard |
| 1 | 2 | 29.0 | 6.90 | 90-day meets | Diversity Job Fair |
| 2 | 5 | 31.0 | 16.13 | 90-day meets | Employee Referral |
| 3 | 2 | 14.0 | 14.29 | 90-day meets | Glassdoor |
| 4 | 2 | 24.0 | 8.33 | 90-day meets | Monster.com |
| 5 | 4 | 18.0 | 22.22 | 90-day meets | Newspager/Magazine |
| 6 | 2 | 9.0 | 22.22 | 90-day meets | Other |
| 7 | 2 | 21.0 | 9.52 | 90-day meets | Pay Per Click - Google |
| 8 | 2 | 25.0 | 8.00 | 90-day meets | Search Engine - Google Bing Yahoo |
| 9 | 3 | 11.0 | 27.27 | 90-day meets | Social Networks - Facebook Twitter etc |
| 10 | 2 | 15.0 | 13.33 | 90-day meets | Vendor Referral |
| 11 | 2 | 13.0 | 15.38 | 90-day meets | Website Banner Ads |
| 12 | 2 | 13.0 | 15.38 | 90-day meets | Word of Mouth |
| 13 | 1 | 16.0 | 6.25 | Exceeds | Billboard |
| 14 | 5 | 29.0 | 17.24 | Exceeds | Diversity Job Fair |
| 15 | 2 | 31.0 | 6.45 | Exceeds | Employee Referral |
| 16 | 1 | 14.0 | 7.14 | Exceeds | Glassdoor |
| 17 | 1 | 4.0 | 25.00 | Exceeds | Information Session |
| 18 | 3 | 17.0 | 17.65 | Exceeds | MBTA ads |
| 19 | 2 | 24.0 | 8.33 | Exceeds | Monster.com |
| 20 | 1 | 12.0 | 8.33 | Exceeds | On-campus Recruiting |
| 21 | 2 | 9.0 | 22.22 | Exceeds | Other |
| 22 | 3 | 21.0 | 14.29 | Exceeds | Pay Per Click - Google |
| 23 | 4 | 20.0 | 20.00 | Exceeds | Professional Society |
| 24 | 1 | 25.0 | 4.00 | Exceeds | Search Engine - Google Bing Yahoo |
| 25 | 1 | 11.0 | 9.09 | Exceeds | Social Networks - Facebook Twitter etc |
| 26 | 1 | 13.0 | 7.69 | Exceeds | Website Banner Ads |
| 27 | 1 | 16.0 | 6.25 | Exceptional | Billboard |
| 28 | 1 | 29.0 | 3.45 | Exceptional | Diversity Job Fair |
| 29 | 3 | 31.0 | 9.68 | Exceptional | Employee Referral |
| 30 | 2 | 17.0 | 11.76 | Exceptional | MBTA ads |
| 31 | 2 | 20.0 | 10.00 | Exceptional | Professional Society |
| 32 | 10 | 16.0 | 62.50 | Fully Meets | Billboard |
| 33 | 1 | 1.0 | 100.00 | Fully Meets | Careerbuilder |
| 34 | 1 | 1.0 | 100.00 | Fully Meets | Company Intranet - Partner |
| 35 | 14 | 29.0 | 48.28 | Fully Meets | Diversity Job Fair |
| 36 | 16 | 31.0 | 51.61 | Fully Meets | Employee Referral |
| 37 | 9 | 14.0 | 64.29 | Fully Meets | Glassdoor |
| 38 | 8 | 8.0 | 100.00 | Fully Meets | Indeed |
| 39 | 2 | 4.0 | 50.00 | Fully Meets | Information Session |
| 40 | 4 | 6.0 | 66.67 | Fully Meets | Internet Search |
| 41 | 9 | 17.0 | 52.94 | Fully Meets | MBTA ads |
| 42 | 15 | 24.0 | 62.50 | Fully Meets | Monster.com |
| 43 | 10 | 18.0 | 55.56 | Fully Meets | Newspager/Magazine |
| 44 | 9 | 12.0 | 75.00 | Fully Meets | On-campus Recruiting |
| 45 | 1 | 1.0 | 100.00 | Fully Meets | On-line Web application |
| 46 | 3 | 9.0 | 33.33 | Fully Meets | Other |
| 47 | 12 | 21.0 | 57.14 | Fully Meets | Pay Per Click - Google |
| 48 | 9 | 20.0 | 45.00 | Fully Meets | Professional Society |
| 49 | 19 | 25.0 | 76.00 | Fully Meets | Search Engine - Google Bing Yahoo |
| 50 | 6 | 11.0 | 54.55 | Fully Meets | Social Networks - Facebook Twitter etc |
| 51 | 9 | 15.0 | 60.00 | Fully Meets | Vendor Referral |
| 52 | 6 | 13.0 | 46.15 | Fully Meets | Website Banner Ads |
| 53 | 8 | 13.0 | 61.54 | Fully Meets | Word of Mouth |
| 54 | 1 | 16.0 | 6.25 | N/A- too early to review | Billboard |
| 55 | 3 | 29.0 | 10.34 | N/A- too early to review | Diversity Job Fair |
| 56 | 5 | 31.0 | 16.13 | N/A- too early to review | Employee Referral |
| 57 | 1 | 14.0 | 7.14 | N/A- too early to review | Glassdoor |
| 58 | 1 | 4.0 | 25.00 | N/A- too early to review | Information Session |
| 59 | 1 | 6.0 | 16.67 | N/A- too early to review | Internet Search |
| 60 | 2 | 24.0 | 8.33 | N/A- too early to review | Monster.com |
| 61 | 4 | 18.0 | 22.22 | N/A- too early to review | Newspager/Magazine |
| 62 | 2 | 12.0 | 16.67 | N/A- too early to review | On-campus Recruiting |
| 63 | 1 | 9.0 | 11.11 | N/A- too early to review | Other |
| 64 | 1 | 1.0 | 100.00 | N/A- too early to review | Pay Per Click |
| 65 | 2 | 21.0 | 9.52 | N/A- too early to review | Pay Per Click - Google |
| 66 | 3 | 20.0 | 15.00 | N/A- too early to review | Professional Society |
| 67 | 1 | 25.0 | 4.00 | N/A- too early to review | Search Engine - Google Bing Yahoo |
| 68 | 1 | 11.0 | 9.09 | N/A- too early to review | Social Networks - Facebook Twitter etc |
| 69 | 4 | 15.0 | 26.67 | N/A- too early to review | Vendor Referral |
| 70 | 2 | 13.0 | 15.38 | N/A- too early to review | Website Banner Ads |
| 71 | 2 | 13.0 | 15.38 | N/A- too early to review | Word of Mouth |
| 72 | 1 | 16.0 | 6.25 | Needs Improvement | Billboard |
| 73 | 3 | 29.0 | 10.34 | Needs Improvement | Diversity Job Fair |
| 74 | 1 | 14.0 | 7.14 | Needs Improvement | Glassdoor |
| 75 | 1 | 6.0 | 16.67 | Needs Improvement | Internet Search |
| 76 | 2 | 17.0 | 11.76 | Needs Improvement | MBTA ads |
| 77 | 3 | 24.0 | 12.50 | Needs Improvement | Monster.com |
| 78 | 1 | 9.0 | 11.11 | Needs Improvement | Other |
| 79 | 1 | 21.0 | 4.76 | Needs Improvement | Pay Per Click - Google |
| 80 | 1 | 25.0 | 4.00 | Needs Improvement | Search Engine - Google Bing Yahoo |
| 81 | 1 | 13.0 | 7.69 | Needs Improvement | Word of Mouth |
| 82 | 1 | 16.0 | 6.25 | PIP | Billboard |
| 83 | 1 | 29.0 | 3.45 | PIP | Diversity Job Fair |
| 84 | 1 | 17.0 | 5.88 | PIP | MBTA ads |
| 85 | 1 | 21.0 | 4.76 | PIP | Pay Per Click - Google |
| 86 | 2 | 20.0 | 10.00 | PIP | Professional Society |
| 87 | 1 | 25.0 | 4.00 | PIP | Search Engine - Google Bing Yahoo |
| 88 | 2 | 13.0 | 15.38 | PIP | Website Banner Ads |
# Создадим сетку графиков зависимости оценки произаводительности от источника найма сотрудников
g=sns.catplot(
kind="bar",
x="Employee Source",
y="percent",
hue="Employee Source",
data=dfg_PerformancePerSource1,
col_order=['N/A- too early to review', '90-day meets',
'Fully Meets', 'Exceptional', 'Exceeds', 'Needs Improvement', 'PIP'],
col="Performance Score",
col_wrap=4,
height=4,
aspect = 1,
palette="Set2",
margin_titles=True,
sharex=False,
sharey=False,
dodge=False
)
g.fig.suptitle(
"Зависимость оценки производительности сотрудников от источника найма",
fontsize=16, x=0.50, y=1.03)
# Определим размер подписи оси X
g.set_xlabels('Источник найма', fontsize=12)
# Определим размер подписи оси Y
g.set_ylabels("Percent of Source", fontsize=12)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=90,fontsize=10)
# Изменим интервал между индивидуальными графиками
g.fig.subplots_adjust(hspace=1.65)
plt.show()
Выведем те же данные в другой форме
sql_quiery = \
"""
-- Создадим запрос и датафрейм с оценкой производительности, источником найма, и средней ценой привлечения
-- Используем временное представление, полученное ранее
-- Используем его для построения графика
WITH
empl_source_price AS
(SELECT
"Employment Source" AS "Employee Source",
"Total"
FROM
recruiting_costs
)
SELECT
CONCAT("Employee Source", ': $', ROUND("Total"::numeric/employees_per_source, 2)) AS "Employee Source",
ROUND("Total"::numeric/employees_per_source) AS "Average price",
SUM(
CASE WHEN "Performance Score" = 'N/A- too early to review'
THEN percent
END)
AS "N/A- too early to review",
SUM(
CASE WHEN "Performance Score" = '90-day meets'
THEN percent
END)
AS "90-day meets",
SUM(
CASE WHEN "Performance Score" = 'Fully Meets'
THEN percent
END)
AS "Fully Meets",
SUM(
CASE WHEN "Performance Score" = 'Exceptional'
THEN percent
END)
AS "Exceptional",
SUM(
CASE WHEN "Performance Score" = 'Exceeds'
THEN percent
END)
AS "Exceeds",
SUM(
CASE WHEN "Performance Score" = 'Needs Improvement'
THEN percent
END)
AS "Needs Improvement",
SUM(
CASE WHEN "Performance Score" = 'PIP'
THEN percent
END)
AS "PIP"
FROM
PerformancePerSource
LEFT JOIN
empl_source_price
USING("Employee Source")
GROUP BY
"Employee Source",
"Total",
employees_per_source
ORDER BY
"Average price" DESC,
"Employee Source"
"""
dfg_PerformancePerSource2 = pd.read_sql(sql_quiery, conn, index_col=["Employee Source"]).fillna(0)
dfg_PerformancePerSource2
| Average price | N/A- too early to review | 90-day meets | Fully Meets | Exceptional | Exceeds | Needs Improvement | PIP | |
|---|---|---|---|---|---|---|---|---|
| Employee Source | ||||||||
| Indeed: $ | 0.0 | 0.00 | 0.00 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| Careerbuilder: $7790.00 | 7790.0 | 0.00 | 0.00 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| Pay Per Click: $1323.00 | 1323.0 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| MBTA ads: $645.88 | 646.0 | 0.00 | 0.00 | 52.94 | 11.76 | 17.65 | 11.76 | 5.88 |
| On-campus Recruiting: $625.00 | 625.0 | 16.67 | 0.00 | 75.00 | 0.00 | 8.33 | 0.00 | 0.00 |
| Website Banner Ads: $549.46 | 549.0 | 15.38 | 15.38 | 46.15 | 0.00 | 7.69 | 0.00 | 15.38 |
| Social Networks - Facebook Twitter etc: $506.64 | 507.0 | 9.09 | 27.27 | 54.55 | 0.00 | 9.09 | 0.00 | 0.00 |
| Newspager/Magazine: $460.61 | 461.0 | 22.22 | 22.22 | 55.56 | 0.00 | 0.00 | 0.00 | 0.00 |
| Other: $443.89 | 444.0 | 11.11 | 22.22 | 33.33 | 0.00 | 22.22 | 11.11 | 0.00 |
| Billboard: $387.00 | 387.0 | 6.25 | 6.25 | 62.50 | 6.25 | 6.25 | 6.25 | 6.25 |
| Diversity Job Fair: $345.55 | 346.0 | 10.34 | 6.90 | 48.28 | 3.45 | 17.24 | 10.34 | 3.45 |
| Monster.com: $240.00 | 240.0 | 8.33 | 8.33 | 62.50 | 0.00 | 8.33 | 12.50 | 0.00 |
| Search Engine - Google Bing Yahoo: $207.32 | 207.0 | 4.00 | 8.00 | 76.00 | 0.00 | 4.00 | 4.00 | 4.00 |
| Pay Per Click - Google: $167.10 | 167.0 | 9.52 | 9.52 | 57.14 | 0.00 | 14.29 | 4.76 | 4.76 |
| Professional Society: $60.00 | 60.0 | 15.00 | 0.00 | 45.00 | 10.00 | 20.00 | 0.00 | 10.00 |
| Company Intranet - Partner: $0.00 | 0.0 | 0.00 | 0.00 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| Employee Referral: $0.00 | 0.0 | 16.13 | 16.13 | 51.61 | 9.68 | 6.45 | 0.00 | 0.00 |
| Glassdoor: $0.00 | 0.0 | 7.14 | 14.29 | 64.29 | 0.00 | 7.14 | 7.14 | 0.00 |
| Information Session: $0.00 | 0.0 | 25.00 | 0.00 | 50.00 | 0.00 | 25.00 | 0.00 | 0.00 |
| Internet Search: $0.00 | 0.0 | 16.67 | 0.00 | 66.67 | 0.00 | 0.00 | 16.67 | 0.00 |
| On-line Web application: $0.00 | 0.0 | 0.00 | 0.00 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| Vendor Referral: $0.00 | 0.0 | 26.67 | 13.33 | 60.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| Word of Mouth: $0.00 | 0.0 | 15.38 | 15.38 | 61.54 | 0.00 | 0.00 | 7.69 | 0.00 |
data = dfg_PerformancePerSource2.drop(columns=["Average price"]) # Определим данные для графика
category_names = data.columns.tolist() # Определим список категорий грфика
labels = data.index.tolist() # Определим список заголовков столбцов графика
data_cum = data.cumsum(axis=1) # создадим ДФ с наращеным итогом данных (в данном случае - процентов)
# Создадим массив цветов в формате RGBA из цветовой гаммы сообразно категориям графика:
# data.shape[1] - размерность массива наших данных по горизонтали
# np.linspace - равномерно распределяет данные в количестве data.shape[1] в диапазоне от 0 до 1
# plt.colomaps[palette].array - выделяет RGBA значения в линейке палитры сообразно значениям np.linspace
category_colors = plt.colormaps['ocean_r'](np.linspace(0, 1, data.shape[1]))
# Строим график
fig, ax = plt.subplots(figsize=(15,12)) # определяем поле графика
ax.invert_yaxis() # инвертируем порядок вывода категорий
# ax.xaxis.set_visible(False) # ну будем делать невидимыми метки по оси Х
ax.set_xlim(0, data.sum(axis=1).max()) # Для универсальности установим границы значений для оси X
# от 0 до максимума суммы по горизонтали. В нашем случае верхняя граница всегда = 100
# Выведем график в цикле последовательной прорисовка каждой категории
# столбец графика и цвет по спискам категорий и цветов
for i, (colname, color) in enumerate(zip(category_names, category_colors)):
widths = data.iloc[:, i] # длина столбца данной категории (значения все строк и данного стоблца в data)
starts = data_cum.iloc[:, i] - widths # начало отрисовки столбца данной категории аналогично от суммарных значений
rects = ax.barh(labels, widths, left=starts, height=0.8,
label=colname, color=color) # определяем прямоугольники графика для каждой категории
# определяем зависимость цвета подписей на графике от значений RGB:
r, g, b, _ = color
text_color = 'white' if r * g * b < 0.5 else 'black'
# Определяем подписи на прямоугольниках графика
ax.bar_label(rects, label_type='center', color=text_color)
# Определяем легенду
ax.legend(ncol=len(category_names), bbox_to_anchor=(0.05, 1),
loc='lower left', fontsize='small')
# Определяем заголовок
ax.set_title('Зависимость оценки производительности от источника найма c учётом средней стоимости найма, %', pad=40)
ax.title.set_size(14)
plt.show()
ВЫВОД
sql_quiery = \
"""
-- Создадим запросы и датафрейм с оценкой производительности и KPI,
-- Используем его для построения графиков
WITH
kpi_s AS
(SELECT
"Employee Name",
"Performance Score",
'Abutments/Hour Wk 1' AS "KPI_Name",
"Abutments/Hour Wk 1" AS "KPI_Value"
FROM
production_staff
UNION ALL
SELECT
"Employee Name",
"Performance Score",
'Abutments/Hour Wk 2' AS "KPI_Name",
"Abutments/Hour Wk 2" AS "KPI_Value"
FROM
production_staff
UNION ALL
SELECT
"Employee Name",
"Performance Score",
'Daily Error Rate' AS "KPI_Name",
"Daily Error Rate" AS "KPI_Value"
FROM
production_staff
UNION ALL
SELECT
"Employee Name",
"Performance Score",
'90-day Complaints' AS "KPI_Name",
"90-day Complaints" AS "KPI_Value"
FROM
production_staff
)
SELECT
*
FROM
kpi_s
;
"""
dfg_performance_over_KPI = pd.read_sql(sql_quiery, conn).dropna()
dfg_performance_over_KPI
| Employee Name | Performance Score | KPI_Name | KPI_Value | |
|---|---|---|---|---|
| 0 | Albert, Michael | Fully Meets | Abutments/Hour Wk 1 | 0.0 |
| 1 | Bozzi, Charles | Fully Meets | Abutments/Hour Wk 1 | 0.0 |
| 2 | Butler, Webster L | Exceeds | Abutments/Hour Wk 1 | 0.0 |
| 3 | Dunn, Amy | Fully Meets | Abutments/Hour Wk 1 | 0.0 |
| 4 | Gray, Elijiah | Fully Meets | Abutments/Hour Wk 1 | 0.0 |
| ... | ... | ... | ... | ... |
| 971 | Thibaud, Kenneth | Fully Meets | 90-day Complaints | 0.0 |
| 972 | Trzeciak, Cybil | Fully Meets | 90-day Complaints | 1.0 |
| 973 | Walker, Roger | Fully Meets | 90-day Complaints | 0.0 |
| 974 | Winthrop, Jordan | Exceeds | 90-day Complaints | 0.0 |
| 975 | Wolk, Hang T | Fully Meets | 90-day Complaints | 0.0 |
832 rows × 4 columns
score_list = ['N/A- too early to review', '90-day meets',
'Fully Meets', 'Exceptional', 'Exceeds', 'Needs Improvement', 'PIP']
g=sns.catplot(x="KPI_Name",
y="KPI_Value",
hue="KPI_Name",
data=dfg_performance_over_KPI,
col="Performance Score",
col_wrap=3,
col_order=score_list,
kind='boxen',
height=5,
aspect=1,
palette="Set1",
sharex=False,
sharey=True,
dodge=False
)
g.fig.suptitle(
"Зависимость оценки производительности сотрудников от KPI (для департамента Production)",
fontsize=16, x=0.50, y=1.03)
# Определим размер подписи оси X
g.set_xlabels('KPI Name', fontsize=12)
# Определим размер подписи оси Y
g.set_ylabels('KPI Value', fontsize=12)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=45,fontsize=10)
# Установим отметки шкалы Y
plt.yticks(list(range(0, 20, 1)))
# Изменим вертикальный интервал между индивидуальными графиками
g.fig.subplots_adjust(hspace=0.4)
plt.show()
ВЫВОД
В целом в системе зависимости оценок производительности от KPI прослеживается закономерность: чем больше выработки и меньше ошибок, тем лучше оценка. Тем не менее не везде эта зависимость однозначна, отдельные значения вызывают вопросы.
sql_quiery = \
"""
-- Создадим запрос и датафрейм с оценкой производительности и возрастом сотрудника
SELECT
age,
"Performance Score",
COUNT("Employee Number") AS employee_count
FROM
hr_dataset
GROUP BY
age,
"Performance Score"
ORDER BY
age,
"Performance Score"
;
"""
dfg_PerformacePerAge = pd.read_sql(sql_quiery, conn)
#dfg_PerformacePerAge
# Создадим график зависимости оценки производительности от возраста сотрудника
g=sns.relplot(
x="age",
y="employee_count",
hue="Performance Score",
data=dfg_PerformacePerAge,
row=1,
col=1,
kind='line',
palette="tab10",
legend='auto',
height=5,
aspect=2,
alpha=1)
g.fig.suptitle(
"Зависимость оценки производительности от возраста сотрудников",
fontsize=16, x=0.43, y=1.03)
# Определим размер подписи оси X
g.set_xlabels(fontsize=10)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=10)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=0,fontsize=9)
# Изменим интервал между индивидуальными графиками
g.fig.subplots_adjust(hspace=0.7)
plt.show()
Количество сотрудников одинакового возраста в рамках возрастного ряда - величина непостоянная. Поэтому, для сравнения имеет смысл выразить количество сотрудников с разными оценками производительности в процентах.
sql_quiery = \
"""
-- Создадим запросы и датафрейм с оценкой производительности и возрастом сотрудников
-- Распределим оценки производительности по колонкам
WITH
PerformacePerAge AS (
SELECT
age,
"Performance Score",
COUNT("Employee Number") AS employee_count,
SUM(COUNT("Employee Number")) OVER (PARTITION BY age) AS employees_per_age,
ROUND(
COUNT("Employee Number") / (SUM(COUNT("Employee Number")) OVER (PARTITION BY age))*100, 2)
AS percent
FROM
hr_dataset
GROUP BY
age,
"Performance Score"
ORDER BY
age,
"Performance Score"
),
FullyMeats AS (
SELECT
age,
percent AS "Fully Meets"
FROM PerformacePerAge
WHERE "Performance Score" = 'Fully Meets'
),
EarlyToReview AS (
SELECT
age,
percent AS "N/A- too early to review"
FROM PerformacePerAge
WHERE "Performance Score" = 'N/A- too early to review'
),
NeedsImprovement AS (
SELECT
age,
percent AS "Needs Improvement"
FROM PerformacePerAge
WHERE "Performance Score" = 'Needs Improvement'
),
NinetyDayMeets AS (
SELECT
age,
percent AS "90-day meets"
FROM PerformacePerAge
WHERE "Performance Score" = '90-day meets'
),
Exceeds AS (
SELECT
age,
percent AS "Exceeds"
FROM PerformacePerAge
WHERE "Performance Score" = 'Exceeds'
),
PIP AS (
SELECT
age,
percent AS "PIP"
FROM PerformacePerAge
WHERE "Performance Score" = 'PIP'
),
Exceptional AS (
SELECT
age,
percent AS "Exceptional"
FROM PerformacePerAge
WHERE "Performance Score" = 'Exceptional'
)
SELECT *
FROM
FullyMeats
FULL JOIN
Exceptional
USING(age)
FULL JOIN
Exceeds
USING(age)
FULL JOIN
NinetyDayMeets
USING(age)
FULL JOIN
EarlyToReview
USING(age)
FULL JOIN
NeedsImprovement
USING(age)
FULL JOIN
PIP
USING(age)
ORDER BY
age
;
"""
dfg_PerformacePerAgePercent = pd.read_sql(sql_quiery, conn, index_col='age')
#dfg_PerformacePerAgePercent
plot = dfg_PerformacePerAgePercent.plot.bar(
figsize=(15,10),
fontsize=16,
stacked=True,
title='Оценка производительности сотрудников по возрастам, %',
cmap='tab10')
plot.title.set_size(18)
plot.legend(loc=1, bbox_to_anchor=(1.25, 1), fontsize=12)
plot.set(ylabel="Доля, %", xlabel="Возраст")
plt.show()
ВЫВОД
При распределении оценок производительности сотрудников по их возрастам проявляются следующие тенденции.
sql_quiery = \
"""
-- Создадим временное представление, запрос и датафрейм с оценкой производительности и полом сотрудников,
-- Используем его для построения сетки графиков
CREATE OR REPLACE TEMP VIEW
PerformancePerSex AS
SELECT
COUNT("Employee Number") AS empl_count,
SUM(COUNT("Employee Number")) OVER (PARTITION BY sex) AS employees_per_sex,
ROUND(
COUNT("Employee Number") / (SUM(COUNT("Employee Number")) OVER (PARTITION BY sex))*100, 2) AS percent,
"Performance Score",
sex
FROM
hr_dataset
GROUP BY
"Performance Score",
sex
ORDER BY
"Performance Score",
sex
;
SELECT * FROM PerformancePerSex
"""
dfg_PerformancePerSex1 = pd.read_sql(sql_quiery, conn)
dfg_PerformancePerSex1
| empl_count | employees_per_sex | percent | Performance Score | sex | |
|---|---|---|---|---|---|
| 0 | 19 | 177.0 | 10.73 | 90-day meets | Female |
| 1 | 12 | 133.0 | 9.02 | 90-day meets | Male |
| 2 | 16 | 177.0 | 9.04 | Exceeds | Female |
| 3 | 12 | 133.0 | 9.02 | Exceeds | Male |
| 4 | 5 | 177.0 | 2.82 | Exceptional | Female |
| 5 | 4 | 133.0 | 3.01 | Exceptional | Male |
| 6 | 101 | 177.0 | 57.06 | Fully Meets | Female |
| 7 | 80 | 133.0 | 60.15 | Fully Meets | Male |
| 8 | 26 | 177.0 | 14.69 | N/A- too early to review | Female |
| 9 | 11 | 133.0 | 8.27 | N/A- too early to review | Male |
| 10 | 5 | 177.0 | 2.82 | Needs Improvement | Female |
| 11 | 10 | 133.0 | 7.52 | Needs Improvement | Male |
| 12 | 5 | 177.0 | 2.82 | PIP | Female |
| 13 | 4 | 133.0 | 3.01 | PIP | Male |
Как видно из таблицы, если в абсолютном количестве женщин с оценкой, скажем, "Fully Meets" больше, чем мужчин с той же оценкой, то в процентном соотношении ситуация обратная. Это связано с тем, что персонал не делится ровно 50/50 по половому признаку. Женщин в компании больше. Поэтому справедливо будет рассматривать зависимость оценки производительность от пола работника не в абсолютном выражении, а в процентах в подгруппе.
# Создадим сетку графиков зависимости оценки произаводительности от пола сотрудников
g=sns.catplot(
kind="bar",
x="sex",
y="percent",
hue="sex",
data=dfg_PerformancePerSex1,
col_order=['N/A- too early to review', '90-day meets',
'Fully Meets', 'Exceptional', 'Exceeds', 'Needs Improvement', 'PIP'],
col="Performance Score",
col_wrap=4,
height=4,
aspect = 1,
palette="Pastel1",
margin_titles=True,
sharex=False,
sharey=False,
dodge=False
)
g.fig.suptitle(
"Зависимость оценки производительности от пола сотрудника",
fontsize=16, x=0.50, y=1.03)
# Определим размер подписи оси X
g.set_xlabels(fontsize=12)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=12)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=0,fontsize=10)
# Изменим интервал между индивидуальными графиками
g.fig.subplots_adjust(hspace=0.2)
plt.show()
Выведем те же данные в другой форме
sql_quiery = \
"""
-- Создадим запрос и датафрейм с оценкой производительности и полом сотрудников,
-- Используем временное представление, полученное ранее
-- Используем его для построения графика
SELECT
sex,
SUM(
CASE WHEN "Performance Score" = 'N/A- too early to review'
THEN percent
END)
AS "N/A- too early to review",
SUM(
CASE WHEN "Performance Score" = '90-day meets'
THEN percent
END)
AS "90-day meets",
SUM(
CASE WHEN "Performance Score" = 'Fully Meets'
THEN percent
END)
AS "Fully Meets",
SUM(
CASE WHEN "Performance Score" = 'Exceptional'
THEN percent
END)
AS "Exceptional",
SUM(
CASE WHEN "Performance Score" = 'Exceeds'
THEN percent
END)
AS "Exceeds",
SUM(
CASE WHEN "Performance Score" = 'Needs Improvement'
THEN percent
END)
AS "Needs Improvement",
SUM(
CASE WHEN "Performance Score" = 'PIP'
THEN percent
END)
AS "PIP"
FROM
PerformancePerSex
GROUP BY
sex
"""
dfg_PerformancePerSex2 = pd.read_sql(sql_quiery, conn, index_col='sex')
dfg_PerformancePerSex2
| N/A- too early to review | 90-day meets | Fully Meets | Exceptional | Exceeds | Needs Improvement | PIP | |
|---|---|---|---|---|---|---|---|
| sex | |||||||
| Male | 8.27 | 9.02 | 60.15 | 3.01 | 9.02 | 7.52 | 3.01 |
| Female | 14.69 | 10.73 | 57.06 | 2.82 | 9.04 | 2.82 | 2.82 |
data = dfg_PerformancePerSex2 # Определим данные для графика
category_names = data.columns.tolist() # Определим список категорий грфика
labels = data.index.tolist() # Определим список заголовков столбцов графика
data_cum = data.cumsum(axis=1) # создадим ДФ с наращеным итогом данных (в данном случае - процентов)
# Создадим массив цветов в формате RGBA из цветовой гаммы сообразно категориям графика:
# data.shape[1] - размерность массива наших данных по горизонтали
# np.linspace - равномерно распределяет данные в количестве data.shape[1] в диапазоне от 0 до 1
# plt.colomaps[palette].array - выделяет RGBA значения в линейке палитры сообразно значениям np.linspace
category_colors = plt.colormaps['PuRd'](np.linspace(0, 1, data.shape[1]))
# Строим график
fig, ax = plt.subplots(figsize=(15,2)) # определяем поле графика
# ax.invert_yaxis() # нет необходимости инвертировать порядок вывода категорий
# ax.xaxis.set_visible(False) # ну будем делать невидимыми метки по оси Х
ax.set_xlim(0, data.sum(axis=1).max()) # Для универсальности установим границы значений для оси X
# от 0 до максимума суммы по горизонтали. В нашем случае верхняя граница всегда = 100
# Выведем график в цикле последовательной прорисовка каждой категории
# столбец графика и цвет по спискам категорий и цветов
for i, (colname, color) in enumerate(zip(category_names, category_colors)):
widths = data.iloc[:, i] # длина столбца данной категории (значения все строк и данного стоблца в data)
starts = data_cum.iloc[:, i] - widths # начало отрисовки столбца данной категории аналогично от суммарных значений
rects = ax.barh(labels, widths, left=starts, height=0.7,
label=colname, color=color) # определяем прямоугольники графика для каждой категории
# определяем зависимость цвета подписей на графике от значений RGB:
r, g, b, _ = color
text_color = 'white' if r * g * b < 0.5 else 'black'
# Определяем подписи на прямоугольниках графика
ax.bar_label(rects, label_type='center', color=text_color)
# Определяем легенду
ax.legend(ncol=len(category_names), bbox_to_anchor=(0.05, 1),
loc='lower left', fontsize='small')
# Определяем заголовок
ax.set_title('Оценка производительности сотрудников в зависимости от пола, %', pad=40)
ax.title.set_size(14)
plt.show()
ВЫВОД
sql_quiery = \
"""
-- Создадим временное представление, запрос и датафрейм
-- с оценкой производительности и расово-этнической принадлежностью сотрудников,
-- Используем его для построения сетки графиков
CREATE OR REPLACE TEMPORARY VIEW
PerformancePerRace AS
SELECT
COUNT("Employee Number") AS empl_count,
SUM(COUNT("Employee Number")) OVER (PARTITION BY racedesc) AS employees_per_race,
ROUND(
COUNT("Employee Number") / (SUM(COUNT("Employee Number")) OVER (PARTITION BY racedesc))*100, 2) AS percent,
"Performance Score",
racedesc
FROM
hr_dataset
GROUP BY
"Performance Score",
racedesc
ORDER BY
"Performance Score",
racedesc
;
SELECT * FROM PerformancePerRace
"""
dfg_PerformancePerRace1 = pd.read_sql(sql_quiery, conn)
dfg_PerformancePerRace1
| empl_count | employees_per_race | percent | Performance Score | racedesc | |
|---|---|---|---|---|---|
| 0 | 4 | 34.0 | 11.76 | 90-day meets | Asian |
| 1 | 3 | 57.0 | 5.26 | 90-day meets | Black or African American |
| 2 | 1 | 18.0 | 5.56 | 90-day meets | Two or more races |
| 3 | 23 | 193.0 | 11.92 | 90-day meets | White |
| 4 | 2 | 4.0 | 50.00 | Exceeds | American Indian or Alaska Native |
| 5 | 3 | 34.0 | 8.82 | Exceeds | Asian |
| 6 | 2 | 57.0 | 3.51 | Exceeds | Black or African American |
| 7 | 3 | 18.0 | 16.67 | Exceeds | Two or more races |
| 8 | 18 | 193.0 | 9.33 | Exceeds | White |
| 9 | 1 | 34.0 | 2.94 | Exceptional | Asian |
| 10 | 3 | 57.0 | 5.26 | Exceptional | Black or African American |
| 11 | 5 | 193.0 | 2.59 | Exceptional | White |
| 12 | 2 | 4.0 | 50.00 | Fully Meets | American Indian or Alaska Native |
| 13 | 20 | 34.0 | 58.82 | Fully Meets | Asian |
| 14 | 37 | 57.0 | 64.91 | Fully Meets | Black or African American |
| 15 | 2 | 4.0 | 50.00 | Fully Meets | Hispanic |
| 16 | 10 | 18.0 | 55.56 | Fully Meets | Two or more races |
| 17 | 110 | 193.0 | 56.99 | Fully Meets | White |
| 18 | 4 | 34.0 | 11.76 | N/A- too early to review | Asian |
| 19 | 4 | 57.0 | 7.02 | N/A- too early to review | Black or African American |
| 20 | 1 | 4.0 | 25.00 | N/A- too early to review | Hispanic |
| 21 | 2 | 18.0 | 11.11 | N/A- too early to review | Two or more races |
| 22 | 26 | 193.0 | 13.47 | N/A- too early to review | White |
| 23 | 1 | 34.0 | 2.94 | Needs Improvement | Asian |
| 24 | 7 | 57.0 | 12.28 | Needs Improvement | Black or African American |
| 25 | 1 | 4.0 | 25.00 | Needs Improvement | Hispanic |
| 26 | 1 | 18.0 | 5.56 | Needs Improvement | Two or more races |
| 27 | 5 | 193.0 | 2.59 | Needs Improvement | White |
| 28 | 1 | 34.0 | 2.94 | PIP | Asian |
| 29 | 1 | 57.0 | 1.75 | PIP | Black or African American |
| 30 | 1 | 18.0 | 5.56 | PIP | Two or more races |
| 31 | 6 | 193.0 | 3.11 | PIP | White |
# Создадим сетку графиков зависимости оценки произаводительности от расово-этнической принадлежности сотрудников
g=sns.catplot(
kind="bar",
x="racedesc",
y="percent",
hue="racedesc",
data=dfg_PerformancePerRace1,
col_order=['N/A- too early to review', '90-day meets',
'Fully Meets', 'Exceptional', 'Exceeds', 'Needs Improvement', 'PIP'],
col="Performance Score",
col_wrap=4,
height=4,
aspect = 1,
palette="Set1",
margin_titles=True,
sharex=False,
sharey=False,
dodge=False
)
g.fig.suptitle(
"Зависимость оценки производительности от расово-этнической принадлежности сотрудников",
fontsize=16, x=0.50, y=1.03)
# Определим размер подписи оси X
g.set_xlabels('Marital status', fontsize=12)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=12)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=75,fontsize=10)
# Изменим интервал между индивидуальными графиками
g.fig.subplots_adjust(hspace=0.8)
plt.show()
Выведем те же данные в другой форме
sql_quiery = \
"""
-- Создадим запрос и датафрейм с оценкой производительности и семейным положением сотрудников,
-- Используем временное представление, полученное ранее
-- Используем его для построения графика
SELECT
racedesc,
SUM(
CASE WHEN "Performance Score" = 'N/A- too early to review'
THEN percent
END)
AS "N/A- too early to review",
SUM(
CASE WHEN "Performance Score" = '90-day meets'
THEN percent
END)
AS "90-day meets",
SUM(
CASE WHEN "Performance Score" = 'Fully Meets'
THEN percent
END)
AS "Fully Meets",
SUM(
CASE WHEN "Performance Score" = 'Exceptional'
THEN percent
END)
AS "Exceptional",
SUM(
CASE WHEN "Performance Score" = 'Exceeds'
THEN percent
END)
AS "Exceeds",
SUM(
CASE WHEN "Performance Score" = 'Needs Improvement'
THEN percent
END)
AS "Needs Improvement",
SUM(
CASE WHEN "Performance Score" = 'PIP'
THEN percent
END)
AS "PIP"
FROM
PerformancePerRace
GROUP BY
racedesc
ORDER By
racedesc
"""
dfg_PerformancePerRace2 = pd.read_sql(sql_quiery, conn, index_col='racedesc').fillna(0)
dfg_PerformancePerRace2
| N/A- too early to review | 90-day meets | Fully Meets | Exceptional | Exceeds | Needs Improvement | PIP | |
|---|---|---|---|---|---|---|---|
| racedesc | |||||||
| American Indian or Alaska Native | 0.00 | 0.00 | 50.00 | 0.00 | 50.00 | 0.00 | 0.00 |
| Asian | 11.76 | 11.76 | 58.82 | 2.94 | 8.82 | 2.94 | 2.94 |
| Black or African American | 7.02 | 5.26 | 64.91 | 5.26 | 3.51 | 12.28 | 1.75 |
| Hispanic | 25.00 | 0.00 | 50.00 | 0.00 | 0.00 | 25.00 | 0.00 |
| Two or more races | 11.11 | 5.56 | 55.56 | 0.00 | 16.67 | 5.56 | 5.56 |
| White | 13.47 | 11.92 | 56.99 | 2.59 | 9.33 | 2.59 | 3.11 |
data = dfg_PerformancePerRace2 # Определим данные для графика
category_names = data.columns.tolist() # Определим список категорий грфика
labels = data.index.tolist() # Определим список заголовков столбцов графика
data_cum = data.cumsum(axis=1) # создадим ДФ с наращеным итогом данных (в данном случае - процентов)
# Создадим массив цветов в формате RGBA из цветовой гаммы сообразно категориям графика:
# data.shape[1] - размерность массива наших данных по горизонтали
# np.linspace - равномерно распределяет данные в количестве data.shape[1] в диапазоне от 0 до 1
# plt.colomaps[palette].array - выделяет RGBA значения в линейке палитры сообразно значениям np.linspace
category_colors = plt.colormaps['BuPu'](np.linspace(0, 1, data.shape[1]))
# Строим график
fig, ax = plt.subplots(figsize=(15,6)) # определяем поле графика
# ax.invert_yaxis() # нет необходимости инвертировать порядок вывода категорий
# ax.xaxis.set_visible(False) # ну будем делать невидимыми метки по оси Х
ax.set_xlim(0, data.sum(axis=1).max()) # Для универсальности установим границы значений для оси X
# от 0 до максимума суммы по горизонтали. В нашем случае верхняя граница всегда = 100
# Выведем график в цикле последовательной прорисовка каждой категории
# столбец графика и цвет по спискам категорий и цветов
for i, (colname, color) in enumerate(zip(category_names, category_colors)):
widths = data.iloc[:, i] # длина столбца данной категории (значения все строк и данного стоблца в data)
starts = data_cum.iloc[:, i] - widths # начало отрисовки столбца данной категории аналогично от суммарных значений
rects = ax.barh(labels, widths, left=starts, height=0.7,
label=colname, color=color) # определяем прямоугольники графика для каждой категории
# определяем зависимость цвета подписей на графике от значений RGB:
r, g, b, _ = color
text_color = 'white' if r * g * b < 0.5 else 'black'
# Определяем подписи на прямоугольниках графика
ax.bar_label(rects, label_type='center', color=text_color)
# Определяем легенду
ax.legend(ncol=len(category_names), bbox_to_anchor=(0.05, 1),
loc='lower left', fontsize='small')
# Определяем заголовок
ax.set_title('Зависимость оценки производительности от расово-этнической принадлежности сотрудников, %', pad=40)
ax.title.set_size(14)
plt.show()
ВЫВОД
sql_quiery = \
"""
-- Создадим временное представление, запрос и датафрейм с оценкой производительности и семейным положением сотрудников,
-- Используем его для построения сетки графиков
CREATE OR REPLACE TEMPORARY VIEW
PerformancePerMarital AS
SELECT
COUNT("Employee Number") AS empl_count,
SUM(COUNT("Employee Number")) OVER (PARTITION BY maritaldesc) AS employees_per_marital,
ROUND(
COUNT("Employee Number") / (SUM(COUNT("Employee Number")) OVER (PARTITION BY maritaldesc))*100, 2)AS percent,
"Performance Score",
maritaldesc
FROM
hr_dataset
GROUP BY
"Performance Score",
maritaldesc
ORDER BY
"Performance Score",
maritaldesc
;
SELECT * FROM PerformancePerMarital
;
"""
dfg_PerformancePerMarital1 = pd.read_sql(sql_quiery, conn)
dfg_PerformancePerMarital1
| empl_count | employees_per_marital | percent | Performance Score | maritaldesc | |
|---|---|---|---|---|---|
| 0 | 3 | 30.0 | 10.00 | 90-day meets | Divorced |
| 1 | 15 | 123.0 | 12.20 | 90-day meets | Married |
| 2 | 13 | 137.0 | 9.49 | 90-day meets | Single |
| 3 | 4 | 30.0 | 13.33 | Exceeds | Divorced |
| 4 | 10 | 123.0 | 8.13 | Exceeds | Married |
| 5 | 2 | 12.0 | 16.67 | Exceeds | Separated |
| 6 | 12 | 137.0 | 8.76 | Exceeds | Single |
| 7 | 1 | 30.0 | 3.33 | Exceptional | Divorced |
| 8 | 1 | 123.0 | 0.81 | Exceptional | Married |
| 9 | 6 | 137.0 | 4.38 | Exceptional | Single |
| 10 | 1 | 8.0 | 12.50 | Exceptional | Widowed |
| 11 | 16 | 30.0 | 53.33 | Fully Meets | Divorced |
| 12 | 73 | 123.0 | 59.35 | Fully Meets | Married |
| 13 | 8 | 12.0 | 66.67 | Fully Meets | Separated |
| 14 | 81 | 137.0 | 59.12 | Fully Meets | Single |
| 15 | 3 | 8.0 | 37.50 | Fully Meets | Widowed |
| 16 | 4 | 30.0 | 13.33 | N/A- too early to review | Divorced |
| 17 | 14 | 123.0 | 11.38 | N/A- too early to review | Married |
| 18 | 1 | 12.0 | 8.33 | N/A- too early to review | Separated |
| 19 | 14 | 137.0 | 10.22 | N/A- too early to review | Single |
| 20 | 4 | 8.0 | 50.00 | N/A- too early to review | Widowed |
| 21 | 2 | 30.0 | 6.67 | Needs Improvement | Divorced |
| 22 | 6 | 123.0 | 4.88 | Needs Improvement | Married |
| 23 | 7 | 137.0 | 5.11 | Needs Improvement | Single |
| 24 | 4 | 123.0 | 3.25 | PIP | Married |
| 25 | 1 | 12.0 | 8.33 | PIP | Separated |
| 26 | 4 | 137.0 | 2.92 | PIP | Single |
# Создадим сетку графиков зависимости оценки произаводительности от семейного пололжения сотрудников
g=sns.catplot(
kind="bar",
x="maritaldesc",
y="percent",
hue="maritaldesc",
data=dfg_PerformancePerMarital1,
col_order=['N/A- too early to review', '90-day meets',
'Fully Meets', 'Exceptional', 'Exceeds', 'Needs Improvement', 'PIP'],
col="Performance Score",
col_wrap=4,
height=4,
aspect = 1,
palette="Dark2",
margin_titles=True,
sharex=False,
sharey=False,
dodge=False
)
g.fig.suptitle(
"Зависимость оценки производительности от семейного положения сотрудника",
fontsize=16, x=0.50, y=1.03)
# Определим размер подписи оси X
g.set_xlabels('Marital status', fontsize=12)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=12)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=0,fontsize=10)
# Изменим интервал между индивидуальными графиками
g.fig.subplots_adjust(hspace=0.2)
plt.show()
Выведем те же данные в другой форме
sql_quiery = \
"""
-- Создадим запрос и датафрейм с оценкой производительности и семейным положением сотрудников,
-- Используем временное представление, полученное ранее
-- Используем его для построения графика
SELECT
maritaldesc,
SUM(
CASE WHEN "Performance Score" = 'N/A- too early to review'
THEN percent
END)
AS "N/A- too early to review",
SUM(
CASE WHEN "Performance Score" = '90-day meets'
THEN percent
END)
AS "90-day meets",
SUM(
CASE WHEN "Performance Score" = 'Fully Meets'
THEN percent
END)
AS "Fully Meets",
SUM(
CASE WHEN "Performance Score" = 'Exceptional'
THEN percent
END)
AS "Exceptional",
SUM(
CASE WHEN "Performance Score" = 'Exceeds'
THEN percent
END)
AS "Exceeds",
SUM(
CASE WHEN "Performance Score" = 'Needs Improvement'
THEN percent
END)
AS "Needs Improvement",
SUM(
CASE WHEN "Performance Score" = 'PIP'
THEN percent
END)
AS "PIP"
FROM
PerformancePerMarital
GROUP BY
maritaldesc
ORDER BY
maritaldesc
"""
dfg_PerformancePerMarital2 = pd.read_sql(sql_quiery, conn, index_col='maritaldesc').fillna(0)
dfg_PerformancePerMarital2
| N/A- too early to review | 90-day meets | Fully Meets | Exceptional | Exceeds | Needs Improvement | PIP | |
|---|---|---|---|---|---|---|---|
| maritaldesc | |||||||
| Divorced | 13.33 | 10.00 | 53.33 | 3.33 | 13.33 | 6.67 | 0.00 |
| Married | 11.38 | 12.20 | 59.35 | 0.81 | 8.13 | 4.88 | 3.25 |
| Separated | 8.33 | 0.00 | 66.67 | 0.00 | 16.67 | 0.00 | 8.33 |
| Single | 10.22 | 9.49 | 59.12 | 4.38 | 8.76 | 5.11 | 2.92 |
| Widowed | 50.00 | 0.00 | 37.50 | 12.50 | 0.00 | 0.00 | 0.00 |
data = dfg_PerformancePerMarital2 # Определим данные для графика
category_names = data.columns.tolist() # Определим список категорий грфика
labels = data.index.tolist() # Определим список заголовков столбцов графика
data_cum = data.cumsum(axis=1) # создадим ДФ с наращеным итогом данных (в данном случае - процентов)
# Создадим массив цветов в формате RGBA из цветовой гаммы сообразно категориям графика:
# data.shape[1] - размерность массива наших данных по горизонтали
# np.linspace - равномерно распределяет данные в количестве data.shape[1] в диапазоне от 0 до 1
# plt.colomaps[palette].array - выделяет RGBA значения в линейке палитры сообразно значениям np.linspace
category_colors = plt.colormaps['GnBu'](np.linspace(0, 1, data.shape[1]))
# Строим график
fig, ax = plt.subplots(figsize=(15,5)) # определяем поле графика
# ax.invert_yaxis() # нет необходимости инвертировать порядок вывода категорий
# ax.xaxis.set_visible(False) # ну будем делать невидимыми метки по оси Х
ax.set_xlim(0, data.sum(axis=1).max()) # Для универсальности установим границы значений для оси X
# от 0 до максимума суммы по горизонтали. В нашем случае верхняя граница всегда = 100
# Выведем график в цикле последовательной прорисовка каждой категории
# столбец графика и цвет по спискам категорий и цветов
for i, (colname, color) in enumerate(zip(category_names, category_colors)):
widths = data.iloc[:, i] # длина столбца данной категории (значения все строк и данного стоблца в data)
starts = data_cum.iloc[:, i] - widths # начало отрисовки столбца данной категории аналогично от суммарных значений
rects = ax.barh(labels, widths, left=starts, height=0.7,
label=colname, color=color) # определяем прямоугольники графика для каждой категории
# определяем зависимость цвета подписей на графике от значений RGB:
r, g, b, _ = color
text_color = 'white' if r * g * b < 0.5 else 'black'
# Определяем подписи на прямоугольниках графика
ax.bar_label(rects, label_type='center', color=text_color)
# Определяем легенду
ax.legend(ncol=len(category_names), bbox_to_anchor=(0.05, 1),
loc='lower left', fontsize='small')
# Определяем заголовок
ax.set_title('Оценка производительности сотрудников в зависимости от семейного положения, %', pad=40)
ax.title.set_size(14)
plt.show()
ВЫВОД
Повышение относительных долей неженатых сотрудников в большинстве категорий, возможно, как-то связано с тем, что они в силу собственных жизненных обстоятельств либо могут и стремятся уделять бОльше усилий работе, либо, наоборот, в силу личных переживаний теряют производительность.
Применительно ко всем сотрудникам: как к работающим, так и к уволенным.
sql_quiery = \
"""
-- Создадим запрос и датафрейм со статусами занятости сотрудников и источниками найма.
-- Используем его для построения графика
WITH StatusPerSource AS
(SELECT
COUNT("Employee Number") AS empl_count,
SUM(COUNT("Employee Number")) OVER (PARTITION BY "Employee Source") AS employees_per_source,
ROUND(
COUNT("Employee Number") / (SUM(COUNT("Employee Number")) OVER (PARTITION BY "Employee Source"))*100, 2)
AS percent,
"Employment Status",
"Employee Source"
FROM
hr_dataset
GROUP BY
"Employment Status",
"Employee Source"
ORDER BY
"Employment Status",
"Employee Source"
)
SELECT
"Employee Source",
SUM(
CASE WHEN "Employment Status" = 'Future Start'
THEN percent
END)
AS "Future Start",
SUM(
CASE WHEN "Employment Status" = 'Active'
THEN percent
END)
AS "Active",
SUM(
CASE WHEN "Employment Status" = 'Leave of Absence'
THEN percent
END)
AS "Leave of Absence",
SUM(
CASE WHEN "Employment Status" = 'Voluntarily Terminated'
THEN percent
END)
AS "Voluntarily Terminated",
SUM(
CASE WHEN "Employment Status" = 'Terminated for Cause'
THEN percent
END)
AS "Terminated for Cause",
SUM(empl_count) AS "Employees_Count"
FROM
StatusPerSource
GROUP BY
"Employee Source"
ORDER BY
"Employee Source"
"""
dfg_StatusPerSource = pd.read_sql(sql_quiery, conn, index_col="Employee Source").fillna(0)
dfg_StatusPerSource
| Future Start | Active | Leave of Absence | Voluntarily Terminated | Terminated for Cause | Employees_Count | |
|---|---|---|---|---|---|---|
| Employee Source | ||||||
| Billboard | 0.00 | 68.75 | 0.00 | 25.00 | 6.25 | 16.0 |
| Careerbuilder | 0.00 | 100.00 | 0.00 | 0.00 | 0.00 | 1.0 |
| Company Intranet - Partner | 0.00 | 0.00 | 0.00 | 100.00 | 0.00 | 1.0 |
| Diversity Job Fair | 3.45 | 31.03 | 10.34 | 55.17 | 0.00 | 29.0 |
| Employee Referral | 6.45 | 77.42 | 3.23 | 6.45 | 6.45 | 31.0 |
| Glassdoor | 0.00 | 50.00 | 7.14 | 28.57 | 14.29 | 14.0 |
| Indeed | 0.00 | 100.00 | 0.00 | 0.00 | 0.00 | 8.0 |
| Information Session | 0.00 | 50.00 | 25.00 | 25.00 | 0.00 | 4.0 |
| Internet Search | 0.00 | 66.67 | 0.00 | 33.33 | 0.00 | 6.0 |
| MBTA ads | 0.00 | 70.59 | 5.88 | 17.65 | 5.88 | 17.0 |
| Monster.com | 0.00 | 54.17 | 0.00 | 41.67 | 4.17 | 24.0 |
| Newspager/Magazine | 11.11 | 61.11 | 0.00 | 27.78 | 0.00 | 18.0 |
| On-campus Recruiting | 8.33 | 66.67 | 16.67 | 8.33 | 0.00 | 12.0 |
| On-line Web application | 0.00 | 0.00 | 0.00 | 100.00 | 0.00 | 1.0 |
| Other | 11.11 | 55.56 | 0.00 | 33.33 | 0.00 | 9.0 |
| Pay Per Click | 0.00 | 0.00 | 0.00 | 100.00 | 0.00 | 1.0 |
| Pay Per Click - Google | 0.00 | 85.71 | 0.00 | 14.29 | 0.00 | 21.0 |
| Professional Society | 0.00 | 80.00 | 5.00 | 10.00 | 5.00 | 20.0 |
| Search Engine - Google Bing Yahoo | 0.00 | 36.00 | 4.00 | 56.00 | 4.00 | 25.0 |
| Social Networks - Facebook Twitter etc | 0.00 | 27.27 | 0.00 | 54.55 | 18.18 | 11.0 |
| Vendor Referral | 20.00 | 46.67 | 6.67 | 20.00 | 6.67 | 15.0 |
| Website Banner Ads | 7.69 | 76.92 | 7.69 | 7.69 | 0.00 | 13.0 |
| Word of Mouth | 0.00 | 38.46 | 7.69 | 38.46 | 15.38 | 13.0 |
data = dfg_StatusPerSource.iloc[:, [True, True, True, True, True, False]] # Определим данные для графика (без общего числа)
category_names = data.columns.tolist() # Определим список категорий грфика
labels = data.index.tolist() # Определим список заголовков столбцов графика
data_cum = data.cumsum(axis=1) # создадим ДФ с наращеным итогом данных (в данном случае - процентов)
# Создадим массив цветов в формате RGBA из цветовой гаммы сообразно категориям графика:
# data.shape[1] - размерность массива наших данных (определены в data= выше) по горизонтали
# np.linspace - равномерно распределяет данные в количестве data.shape[1] в диапазоне от 0 до 0.75
# plt.colomaps[palette].array - выделяет RGBA значения в линейке палитры сообразно значениям np.linspace
category_colors = plt.colormaps['viridis'](np.linspace(0, 0.75, data.shape[1]))
# Строим график
fig, ax = plt.subplots(figsize=(15,12)) # определяем поле графика
ax.invert_yaxis() # инвертируем порядок вывода категорий
# ax.xaxis.set_visible(False) # ну будем делать невидимыми метки по оси Х
ax.set_xlim(0, data.sum(axis=1).max()) # Для универсальности установим границы значений для оси X
# от 0 до максимума суммы по горизонтали. В нашем случае верхняя граница всегда = 100
# Выведем график в цикле последовательной прорисовка каждой категории
# столбец графика и цвет по спискам категорий и цветов
for i, (colname, color) in enumerate(zip(category_names, category_colors)):
widths = data.iloc[:, i] # длина столбца данной категории (значения все строк и данного стоблца в data)
starts = data_cum.iloc[:, i] - widths # начало отрисовки столбца данной категории аналогично от суммарных значений
rects = ax.barh(labels, widths, left=starts, height=0.8,
label=colname, color=color) # определяем прямоугольники графика для каждой категории
# определяем зависимость цвета подписей на графике от значений RGB:
r, g, b, _ = color
text_color = 'white' if r * g * b < 0.5 else 'black'
# Определяем подписи на прямоугольниках графика
ax.bar_label(rects, label_type='center', color=text_color)
# Определяем легенду
ax.legend(ncol=len(category_names), bbox_to_anchor=(0.05, 1),
loc='lower left', fontsize='small')
# Определяем заголовок
ax.set_title('Зависимость статуса занятости сотрудников от источника найма, %', pad=40)
ax.title.set_size(14)
plt.show()
ВЫВОД
Общее распределение действующих сотрудников компании по источникам найма уже рассмотрено в п. 1.1.2.5. Теперь, проанализируем, как распределяются источники найма сотрудников в зависимости от подразделений и должностей. При этом рассмотрим не только действующих, но также и уволенных и уволившихся сотрудников.
sql_quiery = \
"""
WITH all_by_dptmnt AS
(
SELECT
"Employee Source",
department,
position,
COUNT("Employee Number") as all_employees
FROM hr_dataset
GROUP BY
"Employee Source",
position,
department
),
active_by_dptmnt as
(
SELECT
"Employee Source",
department,
position,
COUNT("Employee Number") as working_employees
FROM hr_dataset
WHERE -- Условие, что работники действующие
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
GROUP BY
"Employee Source",
position,
department
)
(
SELECT
"Employee Source",
department,
position,
all_employees,
working_employees,
(all_employees - working_employees) AS terminated_employees
FROM
all_by_dptmnt
JOIN active_by_dptmnt USING(department, "Employee Source", position)
ORDER BY
"Employee Source",
department,
position
)
UNION ALL
(SELECT
'TOTALS',
'Totals',
'totals',
SUM(all_employees) AS "all_employees",
SUM(working_employees) AS "working_employees",
(SUM(all_employees) - SUM(working_employees)) AS "terminated_employees"
FROM
all_by_dptmnt
LEFT JOIN active_by_dptmnt USING(department, "Employee Source", position)
)
;
"""
df_deptmnt_and_source_dependance = pd.read_sql(sql_quiery, conn, index_col=["Employee Source",
'department',
'position'
])
df_deptmnt_and_source_dependance
| all_employees | working_employees | terminated_employees | |||
|---|---|---|---|---|---|
| Employee Source | department | position | |||
| Billboard | Production | Production Manager | 2.0 | 1.0 | 1.0 |
| Production Technician I | 10.0 | 7.0 | 3.0 | ||
| Production Technician II | 1.0 | 1.0 | 0.0 | ||
| Sales | Area Sales Manager | 2.0 | 2.0 | 0.0 | |
| Careerbuilder | Production | Production Technician II | 1.0 | 1.0 | 0.0 |
| Diversity Job Fair | Admin Offices | Accountant I | 1.0 | 1.0 | 0.0 |
| Sr. Accountant | 1.0 | 1.0 | 0.0 | ||
| IT/IS | Database Administrator | 1.0 | 1.0 | 0.0 | |
| IT Manager - Infra | 1.0 | 1.0 | 0.0 | ||
| IT Support | 1.0 | 1.0 | 0.0 | ||
| Production | Production Technician I | 14.0 | 5.0 | 9.0 | |
| Production Technician II | 2.0 | 1.0 | 1.0 | ||
| Sales | Area Sales Manager | 1.0 | 1.0 | 0.0 | |
| Sales Manager | 1.0 | 1.0 | 0.0 | ||
| Employee Referral | IT/IS | CIO | 1.0 | 1.0 | 0.0 |
| Database Administrator | 6.0 | 4.0 | 2.0 | ||
| Network Engineer | 3.0 | 3.0 | 0.0 | ||
| Sr. Network Engineer | 3.0 | 3.0 | 0.0 | ||
| Production | Production Manager | 3.0 | 2.0 | 1.0 | |
| Production Technician I | 10.0 | 9.0 | 1.0 | ||
| Production Technician II | 3.0 | 3.0 | 0.0 | ||
| Sales | Area Sales Manager | 2.0 | 2.0 | 0.0 | |
| Glassdoor | IT/IS | Database Administrator | 3.0 | 2.0 | 1.0 |
| IT Support | 1.0 | 1.0 | 0.0 | ||
| Network Engineer | 1.0 | 1.0 | 0.0 | ||
| Production | Production Technician I | 4.0 | 2.0 | 2.0 | |
| Production Technician II | 5.0 | 2.0 | 3.0 | ||
| Indeed | IT/IS | BI Developer | 4.0 | 4.0 | 0.0 |
| Data Architect | 1.0 | 1.0 | 0.0 | ||
| Senior BI Developer | 3.0 | 3.0 | 0.0 | ||
| Information Session | IT/IS | IT Support | 1.0 | 1.0 | 0.0 |
| Network Engineer | 1.0 | 1.0 | 0.0 | ||
| Production | Production Technician II | 1.0 | 1.0 | 0.0 | |
| Internet Search | Admin Offices | Accountant I | 1.0 | 1.0 | 0.0 |
| Production | Production Manager | 2.0 | 1.0 | 1.0 | |
| Production Technician I | 2.0 | 1.0 | 1.0 | ||
| Sales | Area Sales Manager | 1.0 | 1.0 | 0.0 | |
| MBTA ads | Production | Production Technician I | 12.0 | 9.0 | 3.0 |
| Production Technician II | 3.0 | 2.0 | 1.0 | ||
| Sales | Director of Sales | 1.0 | 1.0 | 0.0 | |
| Software Engineering | Software Engineer | 1.0 | 1.0 | 0.0 | |
| Monster.com | Admin Offices | Shared Services Manager | 1.0 | 1.0 | 0.0 |
| IT/IS | Network Engineer | 2.0 | 1.0 | 1.0 | |
| Production | Production Manager | 1.0 | 1.0 | 0.0 | |
| Production Technician I | 10.0 | 4.0 | 6.0 | ||
| Production Technician II | 5.0 | 2.0 | 3.0 | ||
| Sales | Area Sales Manager | 4.0 | 3.0 | 1.0 | |
| Software Engineering | Software Engineer | 1.0 | 1.0 | 0.0 | |
| Newspager/Magazine | Production | Production Technician I | 9.0 | 8.0 | 1.0 |
| Production Technician II | 8.0 | 5.0 | 3.0 | ||
| On-campus Recruiting | IT/IS | Sr. Network Engineer | 1.0 | 1.0 | 0.0 |
| Production | Production Technician I | 9.0 | 9.0 | 0.0 | |
| Production Technician II | 2.0 | 1.0 | 1.0 | ||
| Other | Admin Offices | Sr. Accountant | 1.0 | 1.0 | 0.0 |
| Production | Director of Operations | 1.0 | 1.0 | 0.0 | |
| Production Technician I | 2.0 | 1.0 | 1.0 | ||
| Production Technician II | 2.0 | 1.0 | 1.0 | ||
| Sales | Area Sales Manager | 2.0 | 2.0 | 0.0 | |
| Pay Per Click - Google | Admin Offices | Administrative Assistant | 1.0 | 1.0 | 0.0 |
| Executive Office | President & CEO | 1.0 | 1.0 | 0.0 | |
| IT/IS | Database Administrator | 1.0 | 1.0 | 0.0 | |
| Production | Production Manager | 1.0 | 1.0 | 0.0 | |
| Production Technician I | 5.0 | 4.0 | 1.0 | ||
| Production Technician II | 4.0 | 2.0 | 2.0 | ||
| Sales | Area Sales Manager | 5.0 | 5.0 | 0.0 | |
| Sales Manager | 1.0 | 1.0 | 0.0 | ||
| Software Engineering | Software Engineer | 2.0 | 2.0 | 0.0 | |
| Professional Society | IT/IS | BI Director | 1.0 | 1.0 | 0.0 |
| IT Director | 1.0 | 1.0 | 0.0 | ||
| IT Manager - DB | 1.0 | 1.0 | 0.0 | ||
| IT Manager - Support | 1.0 | 1.0 | 0.0 | ||
| Production | Production Technician I | 10.0 | 8.0 | 2.0 | |
| Production Technician II | 5.0 | 4.0 | 1.0 | ||
| Sales | Area Sales Manager | 1.0 | 1.0 | 0.0 | |
| Search Engine - Google Bing Yahoo | Production | Production Manager | 1.0 | 1.0 | 0.0 |
| Production Technician I | 16.0 | 7.0 | 9.0 | ||
| Software Engineering | Software Engineer | 1.0 | 1.0 | 0.0 | |
| Software Engineering Manager | 1.0 | 1.0 | 0.0 | ||
| Social Networks - Facebook Twitter etc | Production | Production Technician I | 8.0 | 3.0 | 5.0 |
| Vendor Referral | IT/IS | IT Support | 1.0 | 1.0 | 0.0 |
| Network Engineer | 2.0 | 2.0 | 0.0 | ||
| Sr. DBA | 3.0 | 1.0 | 2.0 | ||
| Sr. Network Engineer | 1.0 | 1.0 | 0.0 | ||
| Production | Production Manager | 1.0 | 1.0 | 0.0 | |
| Production Technician I | 1.0 | 1.0 | 0.0 | ||
| Production Technician II | 4.0 | 3.0 | 1.0 | ||
| Software Engineering | Software Engineer | 1.0 | 1.0 | 0.0 | |
| Website Banner Ads | Admin Offices | Accountant I | 1.0 | 1.0 | 0.0 |
| Administrative Assistant | 1.0 | 1.0 | 0.0 | ||
| Production | Production Manager | 1.0 | 1.0 | 0.0 | |
| Production Technician I | 3.0 | 2.0 | 1.0 | ||
| Sales | Area Sales Manager | 7.0 | 7.0 | 0.0 | |
| Word of Mouth | Production | Production Technician I | 8.0 | 4.0 | 4.0 |
| Production Technician II | 5.0 | 2.0 | 3.0 | ||
| TOTALS | Totals | totals | 310.0 | 208.0 | 102.0 |
Полученная таблица достаточно сложна для чтения, поэтому визуализируем полученные данные.
# Создадим датафрейм для постоения графика на основе предыдущего
# Уберём мультииндекс, новый индекс создавать не буедм, уберём строку (TOTALS)
dfg_deptmnt_and_source_dependance = df_deptmnt_and_source_dependance['all_employees'].reset_index([0,1,2])[0:-1]
#dfg_deptmnt_and_source_dependance
# Изменим размер шрифта, используемый по умолчанию (чтобы заголовки фасеток не наслаивались друг на друга)
sns.set(font_scale=.8)
# Определим столбчатую диаграмму с подсчётом значений по категориям
g = sns.catplot(x="position",
y="all_employees",
col="Employee Source",
hue="department",
legend=False, # Не будем выводить легенду внутри сетки у нас будет отдельная
legend_out=True, # Определяем вывод отдельной легенды
col_wrap=4,
sharex=False,
sharey=False,
data=dfg_deptmnt_and_source_dependance,
kind="bar",
height=3,
aspect=1,
dodge=False
) # Определим столбчатую диаграмму с подсчётом значений по категориям
# Определим заголовк графика
g.fig.suptitle("Распределение должностей по источникам найма для всех сотрудников в разрезе департаментов, чел.",
fontsize=16, x=0.32, y=1.06)
# Изменим интервал между индивидуальными графиками
g.fig.subplots_adjust(hspace=1.5)
# Определим размер подписи оси X
g.set_xlabels(fontsize=9)
# Определим размер подписи оси Y
g.set_ylabels(label="Employee count", fontsize=9)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=90, fontsize=8)
# Добавляем общую легенду, определяем её размеры и рамещение (x,y,width,height), количество полей и название
g.add_legend(bbox_to_anchor=(0, 0.84, 0.65, 0.2), loc='upper center', ncol=6, title='Departments')
# Восстановим размер шрифта, используемый по умолчанию.
sns.set(font_scale=1)
plt.show()
ВЫВОД
Анализ данных об источниках найма сотрудников и их распределения по должностям и департаментам показывает следующее.
# Создадим DF для исследования распределения источников найма сотрудников компаниии по дате найма
# Здесь нет необходимости учитывать годы, в которых никто не был принят на работу, в каждом году кто-то принимался.
# Поэтому не будем привязывать данные к сплошной временной шкале.
sql_quiery = \
"""
WITH
employee_selection AS
(SELECT
"Employee Number",
DATE_TRUNC('year', "Date of Hire") AS year, -- приведем даты найма к году
"Employee Source" AS empl_source
FROM
hr_dataset
),
year_of_hire AS
(SELECT
year,
COUNT("Employee Number") AS employee_count,
empl_source
FROM
employee_selection
GROUP BY
year,
empl_source
),
totals AS
(SELECT
year,
COUNT("Employee Number") AS employee_count,
'[TOTAL]' AS empl_source
FROM
employee_selection
GROUP BY
year
)
(SELECT *
FROM
year_of_hire)
UNION ALL
(SELECT *
FROM
totals)
ORDER BY
empl_source,
year
;
"""
dfg_source_over_date_hire = pd.read_sql(sql_quiery, conn)
#dfg_source_over_date_hire
# Построим сетку графиков для зависимости источника найма от года приёма на работу сотрудника
# Данные для каждого источника найма разместим в отдельном графике сетки
g = sns.relplot(data=dfg_source_over_date_hire,
x="year",
y="employee_count",
col="empl_source",
hue="empl_source",
kind="line",
palette="gnuplot",
linewidth=4,
zorder=7,
col_wrap=4,
legend=False,
marker='o',
facet_kws=dict(sharex=False, sharey=False)
)
# Для каждого графика в сетке определим дополнительные параметры
for empl_source, ax in g.axes_dict.items(): # перебираем параметры индивидуальных графиков
# Добавим заголовок каждого графика в виде аннотации, расположенной в пооле графика
ax.text(.05, .95, empl_source, transform=ax.transAxes, fontweight="bold")
# Построим "теневые" графики для других источников найма в поле каждого графика, кроме [TOTAL]
sns.lineplot(data=dfg_source_over_date_hire[dfg_source_over_date_hire['empl_source'] != '[TOTAL]'],
x="year",
y="employee_count",
units="empl_source",
estimator=None,
color=".7",
linewidth=1,
ax=ax
)
# Уберём индивидуальные заголовки графиков
g.set_titles("")
# Установим "плотное" расположение графиков
g.tight_layout()
# Определим общий заголовок графика
g.fig.suptitle(
"Зависимость источника найма от года приёма на работу сотрудника",
fontsize=16, x=0.50, y=1.015)
plt.show()
ВЫВОД
# Создадим DF для исследования распределения источников найма сотрудников компаниии по дате найма
# Здесь нет необходимости учитывать годы, в которых никто не был принят на работу, в каждом году кто-то принимался.
# Поэтому не будем привязывать данные к сплошной временной шкале.
sql_quiery = \
"""
WITH
employee_selection AS
(SELECT
"Employee Number",
DATE_TRUNC('year', "Date of Termination") AS year, -- приведем даты найма к году
"Employee Source" AS empl_source
FROM
hr_dataset
WHERE -- Условие, что работники не действующие
"Employment Status" = 'Voluntarily Terminated' OR
"Employment Status" = 'Terminated for Cause'
),
year_of_term AS
(SELECT
year,
COUNT("Employee Number") AS employee_count,
empl_source
FROM
employee_selection
GROUP BY
year,
empl_source
),
totals AS
(SELECT
year,
COUNT("Employee Number") AS employee_count,
'[TOTAL]' AS empl_source
FROM
employee_selection
GROUP BY
year
)
(SELECT *
FROM
year_of_term)
UNION ALL
(SELECT *
FROM
totals)
ORDER BY
empl_source,
year
;
"""
dfg_source_over_date_term = pd.read_sql(sql_quiery, conn)
#dfg_source_over_date_term
# Построим сетку графиков для зависимости источника найма от года приёма на работу сотрудника
# Данные для каждого источника найма разместим в отдельном графике сетки
g = sns.relplot(data=dfg_source_over_date_term,
x="year",
y="employee_count",
col="empl_source",
hue="empl_source",
kind="line",
palette="gnuplot",
linewidth=4,
zorder=7,
col_wrap=4,
legend=False,
marker='o',
facet_kws=dict(sharex=False, sharey=False)
)
# Для каждого графика в сетке определим дополнительные параметры
for empl_source, ax in g.axes_dict.items(): # перебираем параметры индивидуальных графиков
# Добавим заголовок каждого графика в виде аннотации, расположенной в пооле графика
ax.text(.05, .95, empl_source, transform=ax.transAxes, fontweight="bold")
# Построим "теневые" графики для других источников найма в поле каждого графика, кроме [TOTAL]
sns.lineplot(data=dfg_source_over_date_hire[dfg_source_over_date_hire['empl_source'] != '[TOTAL]'],
x="year",
y="employee_count",
units="empl_source",
estimator=None,
color=".7",
linewidth=1,
ax=ax
)
# Уберём индивидуальные заголовки графиков
g.set_titles("")
# Установим "плотное" расположение графиков
g.tight_layout()
# Определим общий заголовок графика
g.fig.suptitle(
"Зависимость источника найма от года увольнения сотрудника",
fontsize=16, x=0.50, y=1.015)
plt.show()
ВЫВОД
# Создадим DF для исследования распределения возрастов сотрудников компаниии на дату найма в компанию
# в зависимости с источниками найма
sql_quiery = \
"""
WITH
employee_selection AS
(SELECT
"Employee Number",
-- приведём разницу между датой найма и датой рождения к годам
CAST(EXTRACT(year FROM AGE("Date of Hire", dob)) AS INTEGER) AS age_of_hire,
"Employee Source" AS empl_source
FROM
hr_dataset
),
count_per_age AS
(SELECT
empl_source,
age_of_hire,
COUNT("Employee Number") AS employee_count
FROM
employee_selection
GROUP BY
empl_source,
age_of_hire
),
totals AS
(SELECT
'[TOTAL]' AS empl_source,
age_of_hire,
COUNT("Employee Number") AS employee_count
FROM
employee_selection
GROUP BY
age_of_hire
)
(SELECT *
FROM
count_per_age)
UNION ALL
(SELECT *
FROM
totals)
ORDER BY
empl_source,
age_of_hire
;
"""
dfg_source_over_age = pd.read_sql(sql_quiery, conn)
#dfg_source_over_age
g=sns.catplot(x="empl_source",
y="age_of_hire",
hue="empl_source",
data=dfg_source_over_age,
kind='box',
height=5,
aspect=2.5,
palette="tab20b",
dodge=False
)
g.fig.suptitle(
"Распеределение возрастов сотрудников на дату найма по источниками найма",
fontsize=16, x=0.50, y=1.03)
# Определим размер подписи оси X
g.set_xlabels('Employee Source', fontsize=12)
# Определим размер подписи оси Y
g.set_ylabels('Age when hired', fontsize=12)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=90,fontsize=10)
# Установим отметки шкалы Y
plt.yticks(list(range(15, 70, 5)))
plt.show()
ВЫВОД
# Создадим DF для исследования распределения пола сотрудников компаниии
# в зависимости с источниками найма
sql_quiery = \
"""
WITH
employee_selection AS
(SELECT
"Employee Number",
sex,
"Employee Source" AS empl_source
FROM
hr_dataset
),
count_per_sex AS
(SELECT
empl_source,
sex,
COUNT("Employee Number") AS employee_count
FROM
employee_selection
GROUP BY
empl_source,
sex
)
SELECT *
FROM
count_per_sex
ORDER BY
empl_source,
sex
;
"""
dfg_source_over_sex = pd.read_sql(sql_quiery, conn)
#dfg_source_over_sex
# Построим график полового распределения сотрудников по источниками найма
g=sns.catplot(x="empl_source",
y="employee_count",
hue="sex",
hue_order=['Female', 'Male'],
data=dfg_source_over_sex,
kind='bar',
height=5,
aspect=2.5,
palette="Pastel1"
)
g.fig.suptitle(
"Половое распределение сотрудников по источниками найма",
fontsize=16, x=0.50, y=1.03)
# Определим размер подписи оси X
g.set_xlabels('Employee Source', fontsize=12)
# Определим размер подписи оси Y
g.set_ylabels('Employee Count', fontsize=12)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=90,fontsize=10)
# Установим отметки шкалы Y
plt.yticks(list(range(0, 21, 2)))
plt.show()
ВЫВОД
# Создадим DF для исследования распределения сотрудников компаниии по расово-этнической принадлжености
# в зависимости с источниками найма
sql_quiery = \
"""
WITH
employee_selection AS
(SELECT
"Employee Number",
racedesc,
"Employee Source" AS empl_source
FROM
hr_dataset
),
count_per_race AS
(SELECT
empl_source,
racedesc,
COUNT("Employee Number") AS employee_count
FROM
employee_selection
GROUP BY
empl_source,
racedesc
)
SELECT *
FROM
count_per_race
ORDER BY
empl_source,
racedesc
;
"""
dfg_source_over_race = pd.read_sql(sql_quiery, conn)
#dfg_source_over_race
# Построим график распределения сотрудников по расовому признаку по источниками найма
g=sns.catplot(x="empl_source",
y="employee_count",
hue="racedesc",
data=dfg_source_over_race,
kind='bar',
height=5,
aspect=2.5,
palette="tab10",
alpha=1,
legend_out=False,
dodge=True
)
g.fig.suptitle(
"Распределение сотрудников по расовому признаку и по источникам найма",
fontsize=16, x=0.50, y=1.03)
# Определим размер подписи оси X
g.set_xlabels('Employee Source', fontsize=12)
# Определим размер подписи оси Y
g.set_ylabels('Employee Count', fontsize=12)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=90,fontsize=10)
# Установим отметки шкалы Y
plt.yticks(list(range(0, 28, 2)))
plt.show()
ВЫВОД
Исходя из предположения что семейное пололжение сотрудников не изменилось с момента найма.
# Создадим DF для исследования распределения сотрудников компаниии по семейному положению
# в зависимости с источниками найма
sql_quiery = \
"""
WITH
employee_selection AS
(SELECT
"Employee Number",
maritaldesc,
"Employee Source" AS empl_source
FROM
hr_dataset
),
count_per_marital AS
(SELECT
empl_source,
maritaldesc,
COUNT("Employee Number") AS employee_count
FROM
employee_selection
GROUP BY
empl_source,
maritaldesc
)
SELECT *
FROM
count_per_marital
ORDER BY
empl_source,
maritaldesc
;
"""
dfg_source_over_marital = pd.read_sql(sql_quiery, conn)
#dfg_source_over_marital
# Построим график распределения сотрудников по расовому признаку по источниками найма
g=sns.catplot(x="empl_source",
y="employee_count",
hue="maritaldesc",
data=dfg_source_over_marital,
kind='bar',
height=5,
aspect=2.5,
palette="Set2",
alpha=1,
legend_out=False,
dodge=True
)
g.fig.suptitle(
"Распределение сотрудников по семейному положению и по источникам найма",
fontsize=16, x=0.50, y=1.03)
# Определим размер подписи оси X
g.set_xlabels('Employee Source', fontsize=12)
# Определим размер подписи оси Y
g.set_ylabels('Employee Count', fontsize=12)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=90,fontsize=10)
# Установим отметки шкалы Y
plt.yticks(list(range(0, 18, 2)))
plt.show()
ВЫВОД
# Создадим DF для исследования распределения сотрудников компаниии по дате найма в компанию
# На временной шкале надо учесть и месяцы, в которых никто не был принят на работу.
sql_quiery = \
"""
-- создадим подзапросы для выборки из hr_dataset, а также для общей шкалы времени от минимальной даты приема
-- на работу до максимальной (generate_series()). Объединим их на основе месяца с сохранением всех данных полной временной
-- шкалы
WITH
employee_selection AS
(SELECT
"Employee Number",
"Employment Status",
DATE_TRUNC('month', "Date of Hire") AS month -- приведем даты найма к месяцу (начало месяца)
FROM
hr_dataset
),
month_of_hire AS
(SELECT
COUNT("Employee Number") AS employee_count,
"Employment Status",
month
FROM
employee_selection
GROUP BY
"Employment Status",
month
ORDER BY
"Employment Status",
month
),
months AS
(SELECT
month
FROM generate_series(
(SELECT DATE_TRUNC('month', MIN("Date of Hire"))
FROM hr_dataset), -- выберем минимальную дату приема на работу
(SELECT DATE_TRUNC('month', MAX("Date of Hire"))
FROM hr_dataset), -- выберем максимальную дату приема на работу
'1 month') AS month -- выберем месячный интервал
)
SELECT * FROM
months
LEFT OUTER JOIN
month_of_hire
USING(month)
;
"""
df_date_hire_over_empl_status = pd.read_sql(sql_quiery, conn)
df_date_hire_over_empl_status
| month | employee_count | Employment Status | |
|---|---|---|---|
| 0 | 2006-01-01 00:00:00+00:00 | 1.0 | Active |
| 1 | 2006-02-01 00:00:00+00:00 | NaN | None |
| 2 | 2006-03-01 00:00:00+00:00 | NaN | None |
| 3 | 2006-04-01 00:00:00+00:00 | NaN | None |
| 4 | 2006-05-01 00:00:00+00:00 | NaN | None |
| 5 | 2006-06-01 00:00:00+00:00 | NaN | None |
| 6 | 2006-07-01 00:00:00+00:00 | NaN | None |
| 7 | 2006-08-01 00:00:00+00:00 | NaN | None |
| 8 | 2006-09-01 00:00:00+00:00 | NaN | None |
| 9 | 2006-10-01 00:00:00+00:00 | NaN | None |
| 10 | 2006-11-01 00:00:00+00:00 | NaN | None |
| 11 | 2006-12-01 00:00:00+00:00 | NaN | None |
| 12 | 2007-01-01 00:00:00+00:00 | NaN | None |
| 13 | 2007-02-01 00:00:00+00:00 | NaN | None |
| 14 | 2007-03-01 00:00:00+00:00 | NaN | None |
| 15 | 2007-04-01 00:00:00+00:00 | NaN | None |
| 16 | 2007-05-01 00:00:00+00:00 | NaN | None |
| 17 | 2007-06-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 18 | 2007-07-01 00:00:00+00:00 | NaN | None |
| 19 | 2007-08-01 00:00:00+00:00 | NaN | None |
| 20 | 2007-09-01 00:00:00+00:00 | NaN | None |
| 21 | 2007-10-01 00:00:00+00:00 | NaN | None |
| 22 | 2007-11-01 00:00:00+00:00 | 1.0 | Active |
| 23 | 2007-12-01 00:00:00+00:00 | NaN | None |
| 24 | 2008-01-01 00:00:00+00:00 | 1.0 | Active |
| 25 | 2008-02-01 00:00:00+00:00 | NaN | None |
| 26 | 2008-03-01 00:00:00+00:00 | NaN | None |
| 27 | 2008-04-01 00:00:00+00:00 | NaN | None |
| 28 | 2008-05-01 00:00:00+00:00 | NaN | None |
| 29 | 2008-06-01 00:00:00+00:00 | NaN | None |
| 30 | 2008-07-01 00:00:00+00:00 | NaN | None |
| 31 | 2008-08-01 00:00:00+00:00 | NaN | None |
| 32 | 2008-09-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 33 | 2008-10-01 00:00:00+00:00 | 1.0 | Active |
| 34 | 2008-11-01 00:00:00+00:00 | NaN | None |
| 35 | 2008-12-01 00:00:00+00:00 | NaN | None |
| 36 | 2009-01-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 37 | 2009-01-01 00:00:00+00:00 | 3.0 | Active |
| 38 | 2009-02-01 00:00:00+00:00 | NaN | None |
| 39 | 2009-03-01 00:00:00+00:00 | NaN | None |
| 40 | 2009-04-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 41 | 2009-05-01 00:00:00+00:00 | NaN | None |
| 42 | 2009-06-01 00:00:00+00:00 | NaN | None |
| 43 | 2009-07-01 00:00:00+00:00 | 1.0 | Leave of Absence |
| 44 | 2009-08-01 00:00:00+00:00 | NaN | None |
| 45 | 2009-09-01 00:00:00+00:00 | NaN | None |
| 46 | 2009-10-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 47 | 2009-11-01 00:00:00+00:00 | NaN | None |
| 48 | 2009-12-01 00:00:00+00:00 | NaN | None |
| 49 | 2010-01-01 00:00:00+00:00 | NaN | None |
| 50 | 2010-02-01 00:00:00+00:00 | NaN | None |
| 51 | 2010-03-01 00:00:00+00:00 | NaN | None |
| 52 | 2010-04-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 53 | 2010-04-01 00:00:00+00:00 | 2.0 | Active |
| 54 | 2010-05-01 00:00:00+00:00 | 1.0 | Active |
| 55 | 2010-06-01 00:00:00+00:00 | NaN | None |
| 56 | 2010-07-01 00:00:00+00:00 | 1.0 | Active |
| 57 | 2010-08-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 58 | 2010-08-01 00:00:00+00:00 | 1.0 | Active |
| 59 | 2010-09-01 00:00:00+00:00 | 1.0 | Active |
| 60 | 2010-10-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 61 | 2010-11-01 00:00:00+00:00 | NaN | None |
| 62 | 2010-12-01 00:00:00+00:00 | NaN | None |
| 63 | 2011-01-01 00:00:00+00:00 | 9.0 | Voluntarily Terminated |
| 64 | 2011-01-01 00:00:00+00:00 | 2.0 | Terminated for Cause |
| 65 | 2011-01-01 00:00:00+00:00 | 4.0 | Active |
| 66 | 2011-02-01 00:00:00+00:00 | 5.0 | Voluntarily Terminated |
| 67 | 2011-02-01 00:00:00+00:00 | 1.0 | Terminated for Cause |
| 68 | 2011-03-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 69 | 2011-03-01 00:00:00+00:00 | 1.0 | Active |
| 70 | 2011-04-01 00:00:00+00:00 | 4.0 | Voluntarily Terminated |
| 71 | 2011-04-01 00:00:00+00:00 | 4.0 | Active |
| 72 | 2011-05-01 00:00:00+00:00 | 9.0 | Voluntarily Terminated |
| 73 | 2011-05-01 00:00:00+00:00 | 1.0 | Terminated for Cause |
| 74 | 2011-05-01 00:00:00+00:00 | 2.0 | Active |
| 75 | 2011-06-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 76 | 2011-06-01 00:00:00+00:00 | 1.0 | Active |
| 77 | 2011-07-01 00:00:00+00:00 | 8.0 | Voluntarily Terminated |
| 78 | 2011-07-01 00:00:00+00:00 | 1.0 | Terminated for Cause |
| 79 | 2011-07-01 00:00:00+00:00 | 4.0 | Active |
| 80 | 2011-08-01 00:00:00+00:00 | 3.0 | Voluntarily Terminated |
| 81 | 2011-08-01 00:00:00+00:00 | 2.0 | Active |
| 82 | 2011-09-01 00:00:00+00:00 | 8.0 | Voluntarily Terminated |
| 83 | 2011-09-01 00:00:00+00:00 | 1.0 | Terminated for Cause |
| 84 | 2011-09-01 00:00:00+00:00 | 1.0 | Active |
| 85 | 2011-10-01 00:00:00+00:00 | 1.0 | Active |
| 86 | 2011-11-01 00:00:00+00:00 | 6.0 | Voluntarily Terminated |
| 87 | 2011-11-01 00:00:00+00:00 | 4.0 | Active |
| 88 | 2011-12-01 00:00:00+00:00 | NaN | None |
| 89 | 2012-01-01 00:00:00+00:00 | 3.0 | Voluntarily Terminated |
| 90 | 2012-01-01 00:00:00+00:00 | 1.0 | Leave of Absence |
| 91 | 2012-01-01 00:00:00+00:00 | 4.0 | Active |
| 92 | 2012-02-01 00:00:00+00:00 | 5.0 | Active |
| 93 | 2012-03-01 00:00:00+00:00 | 2.0 | Active |
| 94 | 2012-04-01 00:00:00+00:00 | 5.0 | Voluntarily Terminated |
| 95 | 2012-04-01 00:00:00+00:00 | 1.0 | Terminated for Cause |
| 96 | 2012-04-01 00:00:00+00:00 | 4.0 | Active |
| 97 | 2012-05-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 98 | 2012-05-01 00:00:00+00:00 | 3.0 | Active |
| 99 | 2012-06-01 00:00:00+00:00 | NaN | None |
| 100 | 2012-07-01 00:00:00+00:00 | 4.0 | Active |
| 101 | 2012-08-01 00:00:00+00:00 | 2.0 | Voluntarily Terminated |
| 102 | 2012-08-01 00:00:00+00:00 | 2.0 | Active |
| 103 | 2012-09-01 00:00:00+00:00 | 3.0 | Voluntarily Terminated |
| 104 | 2012-09-01 00:00:00+00:00 | 1.0 | Active |
| 105 | 2012-10-01 00:00:00+00:00 | 1.0 | Active |
| 106 | 2012-11-01 00:00:00+00:00 | 2.0 | Active |
| 107 | 2012-12-01 00:00:00+00:00 | NaN | None |
| 108 | 2013-01-01 00:00:00+00:00 | 2.0 | Voluntarily Terminated |
| 109 | 2013-01-01 00:00:00+00:00 | 3.0 | Active |
| 110 | 2013-02-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 111 | 2013-02-01 00:00:00+00:00 | 1.0 | Active |
| 112 | 2013-03-01 00:00:00+00:00 | NaN | None |
| 113 | 2013-04-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 114 | 2013-04-01 00:00:00+00:00 | 2.0 | Active |
| 115 | 2013-05-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 116 | 2013-05-01 00:00:00+00:00 | 2.0 | Active |
| 117 | 2013-06-01 00:00:00+00:00 | NaN | None |
| 118 | 2013-07-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 119 | 2013-07-01 00:00:00+00:00 | 2.0 | Leave of Absence |
| 120 | 2013-07-01 00:00:00+00:00 | 6.0 | Active |
| 121 | 2013-08-01 00:00:00+00:00 | 1.0 | Leave of Absence |
| 122 | 2013-08-01 00:00:00+00:00 | 5.0 | Active |
| 123 | 2013-09-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 124 | 2013-09-01 00:00:00+00:00 | 2.0 | Leave of Absence |
| 125 | 2013-09-01 00:00:00+00:00 | 6.0 | Active |
| 126 | 2013-10-01 00:00:00+00:00 | NaN | None |
| 127 | 2013-11-01 00:00:00+00:00 | 1.0 | Leave of Absence |
| 128 | 2013-11-01 00:00:00+00:00 | 6.0 | Active |
| 129 | 2013-12-01 00:00:00+00:00 | NaN | None |
| 130 | 2014-01-01 00:00:00+00:00 | 6.0 | Active |
| 131 | 2014-02-01 00:00:00+00:00 | 2.0 | Terminated for Cause |
| 132 | 2014-02-01 00:00:00+00:00 | 5.0 | Active |
| 133 | 2014-03-01 00:00:00+00:00 | 1.0 | Terminated for Cause |
| 134 | 2014-03-01 00:00:00+00:00 | 2.0 | Active |
| 135 | 2014-04-01 00:00:00+00:00 | NaN | None |
| 136 | 2014-05-01 00:00:00+00:00 | 2.0 | Leave of Absence |
| 137 | 2014-05-01 00:00:00+00:00 | 8.0 | Active |
| 138 | 2014-06-01 00:00:00+00:00 | NaN | None |
| 139 | 2014-07-01 00:00:00+00:00 | 2.0 | Terminated for Cause |
| 140 | 2014-07-01 00:00:00+00:00 | 7.0 | Active |
| 141 | 2014-08-01 00:00:00+00:00 | 3.0 | Active |
| 142 | 2014-09-01 00:00:00+00:00 | 13.0 | Active |
| 143 | 2014-10-01 00:00:00+00:00 | NaN | None |
| 144 | 2014-11-01 00:00:00+00:00 | 2.0 | Leave of Absence |
| 145 | 2014-11-01 00:00:00+00:00 | 6.0 | Active |
| 146 | 2014-12-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 147 | 2015-01-01 00:00:00+00:00 | 3.0 | Voluntarily Terminated |
| 148 | 2015-01-01 00:00:00+00:00 | 1.0 | Leave of Absence |
| 149 | 2015-01-01 00:00:00+00:00 | 7.0 | Active |
| 150 | 2015-02-01 00:00:00+00:00 | 2.0 | Terminated for Cause |
| 151 | 2015-02-01 00:00:00+00:00 | 1.0 | Leave of Absence |
| 152 | 2015-02-01 00:00:00+00:00 | 5.0 | Active |
| 153 | 2015-03-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 154 | 2015-03-01 00:00:00+00:00 | 11.0 | Active |
| 155 | 2015-04-01 00:00:00+00:00 | NaN | None |
| 156 | 2015-05-01 00:00:00+00:00 | 2.0 | Active |
| 157 | 2015-06-01 00:00:00+00:00 | 1.0 | Future Start |
| 158 | 2015-06-01 00:00:00+00:00 | 1.0 | Active |
| 159 | 2015-07-01 00:00:00+00:00 | 1.0 | Future Start |
| 160 | 2015-08-01 00:00:00+00:00 | NaN | None |
| 161 | 2015-09-01 00:00:00+00:00 | NaN | None |
| 162 | 2015-10-01 00:00:00+00:00 | NaN | None |
| 163 | 2015-11-01 00:00:00+00:00 | NaN | None |
| 164 | 2015-12-01 00:00:00+00:00 | NaN | None |
| 165 | 2016-01-01 00:00:00+00:00 | 2.0 | Active |
| 166 | 2016-02-01 00:00:00+00:00 | NaN | None |
| 167 | 2016-03-01 00:00:00+00:00 | NaN | None |
| 168 | 2016-04-01 00:00:00+00:00 | NaN | None |
| 169 | 2016-05-01 00:00:00+00:00 | 1.0 | Future Start |
| 170 | 2016-06-01 00:00:00+00:00 | 3.0 | Future Start |
| 171 | 2016-07-01 00:00:00+00:00 | 5.0 | Future Start |
| 172 | 2016-08-01 00:00:00+00:00 | NaN | None |
| 173 | 2016-09-01 00:00:00+00:00 | 1.0 | Active |
| 174 | 2016-10-01 00:00:00+00:00 | 2.0 | Active |
| 175 | 2016-11-01 00:00:00+00:00 | NaN | None |
| 176 | 2016-12-01 00:00:00+00:00 | NaN | None |
| 177 | 2017-01-01 00:00:00+00:00 | 1.0 | Active |
| 178 | 2017-02-01 00:00:00+00:00 | 3.0 | Active |
| 179 | 2017-03-01 00:00:00+00:00 | NaN | None |
| 180 | 2017-04-01 00:00:00+00:00 | 2.0 | Active |
# Создадим график распределения сотрудников по месяцу и году найма в компанию в зависимости от статуса занятости
g=sns.relplot(data=df_date_hire_over_empl_status,
x="month",
y="employee_count",
hue="Employment Status",
hue_order=["Active","Leave of Absence", "Future Start", "Voluntarily Terminated", "Terminated for Cause"],
height=6,
aspect=2.1,
palette="Set1",
alpha=0.9,
ci=False,
kind="line",
marker='o'
)
# Определим шкалу как временной диапазон в формате Pandas с квартальными интервалами (для читаемости):
scale_span = pd.date_range(start=df_date_hire_over_empl_status['month'].min(),
end= df_date_hire_over_empl_status['month'].max(),
freq='QS').tolist()
plt.xticks(ticks=scale_span, fontsize=12, rotation=90) # установим шкалу X и размер её обозначений
plt.yticks(ticks=list(range(0, 15, 1)), fontsize=14) # Установим шкалу Y и иразмер обозначения для шкалы Y
# Определим формат подписей шкалы как YYYY-MM
g.axes[0,0].xaxis.set_major_formatter(mpl.dates.DateFormatter('%Y-%m'))
g.set_xlabels("Months", fontsize=15) # Размер подписей шкалы X
g.set_ylabels("Employee Count", fontsize=14) # Размер подписей шкалы Y
# Заголовок графика:
g.fig.suptitle("Общее распеределение сотрудников по месяцу и году найма в зависимости от статуса занятости",
fontsize=16, y=1.025)
plt.show()
ВЫВОД
# Создадим DF для исследования распределения сотрудников компаниии по дате найма в компанию в зависимости от возраста
# на момент найма
# На временной шкале надо учесть и месяцы, в которых никто не был принят на работу.
sql_quiery = \
"""
-- создадим подзапросы для выборки из hr_dataset, а также для общей шкалы времени от минимальной даты приема
-- на работу до максимальной (generate_series()). Объединим их на основе месяца с сохранением всех данных полной временной
-- шкалы
WITH
employee_selection AS
(SELECT
"Employee Number",
CAST(EXTRACT(year FROM AGE("Date of Hire", dob)) AS INTEGER) AS age, --приведем возраст к значению на дату найма
DATE_TRUNC('month', "Date of Hire") AS month -- приведем даты найма к месяцу (начало месяца)
FROM
hr_dataset
),
months AS
(SELECT
month
FROM generate_series(
(SELECT DATE_TRUNC('month', MIN("Date of Hire"))
FROM hr_dataset), -- выберем минимальную дату приема на работу
(SELECT DATE_TRUNC('month', MAX("Date of Hire"))
FROM hr_dataset), -- выберем максимальную дату приема на работу
'1 month') AS month -- выберем месячный интервал
)
SELECT * FROM
months
LEFT OUTER JOIN
employee_selection
USING(month)
;
"""
dfg_date_hire_over_age = pd.read_sql(sql_quiery, conn)
#dfg_date_hire_over_age
# Построим график зависимости даты найма и возраста найма
g=sns.relplot(x="month",
y="age",
data=dfg_date_hire_over_age,
kind='scatter',
height=5,
aspect=2.8
)
# Определим шкалу как временной диапазон в формате Pandas с квартальными интервалами (для читаемости):
scale_span = pd.date_range(start=dfg_date_hire_over_age['month'].min(),
end= dfg_date_hire_over_age['month'].max(),
freq='QS').tolist()
plt.xticks(ticks=scale_span, fontsize=12, rotation=90) # установим шкалу X и размер её обозначений
plt.yticks(ticks=list(range(15, 70, 5)), fontsize=14) # Установим шкалу Y и иразмер обозначения для шкалы Y
# Определим формат подписей шкалы как YYYY-MM
g.axes[0,0].xaxis.set_major_formatter(mpl.dates.DateFormatter('%Y-%m'))
g.set_xlabels("Months", fontsize=15) # Размер подписей шкалы X
g.set_ylabels("Age", fontsize=14) # Размер подписей шкалы Y
# Заголовок графика:
g.fig.suptitle("Зависимость даты найма и возраста сотрудников на дату найма",
fontsize=16, y=1.025)
# Обозначим отдельным цветом "критичные" части графика
g.axes[0,0].axvspan("2011-01", "2012-05", facecolor="orange", alpha=0.25)
g.axes[0,0].axvspan("2012-07", "2013-05", facecolor="orange", alpha=0.25)
g.axes[0,0].axvspan("2013-07", "2015-05", facecolor="orange", alpha=0.25)
g.axes[0,0].axvspan("2016-05", "2017-04", facecolor="orange", alpha=0.25)
g.axes[0,0].axhspan(18, 45, facecolor="deeppink", alpha=0.25)
plt.show()
ВЫВОД
# Создадим DF для исследования распределения сотрудников компаниии по дате найма в компанию в зависимости от пола
# На временной шкале надо учесть и месяцы, в которых никто не был принят на работу.
sql_quiery = \
"""
-- создадим подзапросы для выборки из hr_dataset, а также для общей шкалы времени от минимальной даты приема
-- на работу до максимальной (generate_series()). Объединим их на основе месяца с сохранением всех данных полной временной
-- шкалы
WITH
employee_selection AS
(SELECT
"Employee Number",
sex,
DATE_TRUNC('month', "Date of Hire") AS month -- приведем даты найма к месяцу (начало месяца)
FROM
hr_dataset
),
months AS
(SELECT
month
FROM generate_series(
(SELECT DATE_TRUNC('month', MIN("Date of Hire"))
FROM hr_dataset), -- выберем минимальную дату приема на работу
(SELECT DATE_TRUNC('month', MAX("Date of Hire"))
FROM hr_dataset), -- выберем максимальную дату приема на работу
'1 month') AS month -- выберем месячный интервал
)
SELECT * FROM
months
LEFT OUTER JOIN
employee_selection
USING(month)
;
"""
dfg_date_hire_over_sex = pd.read_sql(sql_quiery, conn)
#dfg_date_hire_over_sex
# Построим график зависимости даты найма и пола сотрудника
g=sns.displot(x="month",
data=dfg_date_hire_over_sex,
col="sex",
col_wrap=1,
hue="sex",
binwidth=91,
kde=True,
height=4,
aspect=2.8,
facet_kws=dict(sharex=True, sharey=True)
)
# Определим шкалу как временной диапазон в формате Pandas с квартальными интервалами (для читаемости):
scale_span = pd.date_range(start=dfg_date_hire_over_sex['month'].min(),
end= dfg_date_hire_over_sex['month'].max(),
freq='QS').tolist()
plt.xticks(ticks=scale_span, fontsize=12, rotation=90) # установим шкалу X и размер её обозначений
# Определим формат подписей шкалы как YYYY-MM
g.axes[1].xaxis.set_major_formatter(mpl.dates.DateFormatter('%Y-%m'))
g.set_xlabels("Months", fontsize=15) # Размер подписей шкалы X
g.set_ylabels(fontsize=14) # Размер подписей шкалы Y
# Заголовок графика:
g.fig.suptitle("Зависимость даты найма и половой принадлежности сотрудника",
fontsize=16, y=1.025)
plt.show()
ВЫВОД
# Создадим DF для исследования распределения сотрудников компаниии по дате найма в компанию в зависимости от
# расово-этнической принадлежности.
# На временной шкале надо учесть и месяцы, в которых никто не был принят на работу.
sql_quiery = \
"""
-- создадим подзапросы для выборки из hr_dataset, а также для общей шкалы времени от минимальной даты приема
-- на работу до максимальной (generate_series()). Объединим их на основе месяца с сохранением всех данных полной временной
-- шкалы
WITH
employee_selection AS
(SELECT
"Employee Number",
racedesc,
DATE_TRUNC('month', "Date of Hire") AS month -- приведем даты найма к месяцу (начало месяца)
FROM
hr_dataset
),
months AS
(SELECT
month
FROM generate_series(
(SELECT DATE_TRUNC('month', MIN("Date of Hire"))
FROM hr_dataset), -- выберем минимальную дату приема на работу
(SELECT DATE_TRUNC('month', MAX("Date of Hire"))
FROM hr_dataset), -- выберем максимальную дату приема на работу
'1 month') AS month -- выберем месячный интервал
)
SELECT * FROM
months
LEFT OUTER JOIN
employee_selection
USING(month)
;
"""
dfg_date_hire_over_racedesc = pd.read_sql(sql_quiery, conn)
#dfg_date_hire_over_racedesc
# Построим график зависимости даты найма и расово-этнической принадлежности сотрудника
g=sns.displot(x="month",
data=dfg_date_hire_over_racedesc,
col="racedesc",
col_wrap=1,
hue="racedesc",
binwidth=91,
kde=True,
height=4,
aspect=2.8,
facet_kws=dict(sharex=True, sharey=True)
)
# Определим шкалу как временной диапазон в формате Pandas с квартальными интервалами (для читаемости):
scale_span = pd.date_range(start=dfg_date_hire_over_sex['month'].min(),
end= dfg_date_hire_over_sex['month'].max(),
freq='QS').tolist()
plt.xticks(ticks=scale_span, fontsize=12, rotation=90) # установим шкалу X и размер её обозначений
# Определим формат подписей шкалы как YYYY-MM
g.axes[1].xaxis.set_major_formatter(mpl.dates.DateFormatter('%Y-%m'))
g.set_xlabels("Months", fontsize=15) # Размер подписей шкалы X
g.set_ylabels(fontsize=14) # Размер подписей шкалы Y
# Заголовок графика:
g.fig.suptitle("Зависимость даты найма и расово-этнической принадлежности сотрудника",
fontsize=16, y=1.025)
plt.show()
ВЫВОД
# Создадим DF для исследования распределения сотрудников компаниии по дате найма в компанию в зависимости от
# семейного положения.
# На временной шкале надо учесть и месяцы, в которых никто не был принят на работу.
sql_quiery = \
"""
-- создадим подзапросы для выборки из hr_dataset, а также для общей шкалы времени от минимальной даты приема
-- на работу до максимальной (generate_series()). Объединим их на основе месяца с сохранением всех данных полной временной
-- шкалы
WITH
employee_selection AS
(SELECT
"Employee Number",
maritaldesc,
DATE_TRUNC('month', "Date of Hire") AS month -- приведем даты найма к месяцу (начало месяца)
FROM
hr_dataset
),
months AS
(SELECT
month
FROM generate_series(
(SELECT DATE_TRUNC('month', MIN("Date of Hire"))
FROM hr_dataset), -- выберем минимальную дату приема на работу
(SELECT DATE_TRUNC('month', MAX("Date of Hire"))
FROM hr_dataset), -- выберем максимальную дату приема на работу
'1 month') AS month -- выберем месячный интервал
)
SELECT * FROM
months
LEFT OUTER JOIN
employee_selection
USING(month)
;
"""
dfg_date_hire_over_maritaldesc = pd.read_sql(sql_quiery, conn)
#dfg_date_hire_over_maritaldesc
# Построим график зависимости даты найма и семейного положения сотрудника
g=sns.displot(x="month",
data=dfg_date_hire_over_maritaldesc,
col="maritaldesc",
col_wrap=1,
hue="maritaldesc",
binwidth=91,
kde=True,
height=4,
aspect=2.8,
facet_kws=dict(sharex=True, sharey=True)
)
# Определим шкалу как временной диапазон в формате Pandas с квартальными интервалами (для читаемости):
scale_span = pd.date_range(start=dfg_date_hire_over_sex['month'].min(),
end= dfg_date_hire_over_sex['month'].max(),
freq='QS').tolist()
plt.xticks(ticks=scale_span, fontsize=12, rotation=90) # установим шкалу X и размер её обозначений
# Определим формат подписей шкалы как YYYY-MM
g.axes[1].xaxis.set_major_formatter(mpl.dates.DateFormatter('%Y-%m'))
g.set_xlabels("Months", fontsize=15) # Размер подписей шкалы X
g.set_ylabels(fontsize=14) # Размер подписей шкалы Y
# Заголовок графика:
g.fig.suptitle("Зависимость даты найма и семейного положения сотрудника",
fontsize=16, y=1.025)
plt.show()
ВЫВОД
Для сравнения уровней заработных плат будем использовать данные по работающим сотрудникам тоже. Для такого сравнения не принципиально, выходит ли сотрудник на работу сейчас, или в ближайшем будущем, или находится в отпуске. Объединим всех таких сотрудников в категорию Current Stuff
sql_quiery = \
"""
-- Создадим запрос и датафрейм с формулировками причин увольнения и зарплатными ставками по должностям,
-- независимо от того, сколько человек на них работает или работало
-- Используем его для построения сетки графиков
WITH
TermPositionsSchedule AS
(SELECT
department,
position,
COUNT("Employee Number") AS empl_count
FROM hr_dataset
GROUP BY
department,
position
ORDER BY
department,
position
),
TermEmployeesBySchedule AS
(SELECT
department,
position,
"Employee Number",
"Employee Name",
CASE
WHEN
("Employment Status" = 'Active' OR
"Employment Status" = 'Leave of Absence' OR
"Employment Status" = 'Future Start')
THEN 'Current Stuff'
WHEN "Employment Status" = 'Voluntarily Terminated'
THEN 'Voluntarily Terminated'
WHEN "Employment Status" = 'Terminated for Cause'
THEN 'Terminated for Cause'
END
AS "Employment Status",
"Reason For Term",
CAST("Pay Rate" * 2080 AS INTEGER) AS usd_per_year
FROM hr_dataset
ORDER BY
department,
position,
"Pay Rate",
"Employee Number",
"Reason For Term"
)
SELECT
*
FROM
TermPositionsSchedule
LEFT JOIN
TermEmployeesBySchedule
USING (department, position)
;
"""
dfg_TermReasonAndPayrate = pd.read_sql(sql_quiery, conn)
dfg_TermReasonAndPayrate
| department | position | empl_count | Employee Number | Employee Name | Employment Status | Reason For Term | usd_per_year | |
|---|---|---|---|---|---|---|---|---|
| 0 | Admin Offices | Accountant I | 3 | 1106026572 | LaRotonda, William | Current Stuff | N/A - still employed | 47840 |
| 1 | Admin Offices | Accountant I | 3 | 1103024456 | Brown, Mia | Current Stuff | N/A - still employed | 59280 |
| 2 | Admin Offices | Accountant I | 3 | 1302053333 | Steans, Tyrone | Current Stuff | N/A - still employed | 60320 |
| 3 | Admin Offices | Administrative Assistant | 3 | 1307059817 | Singh, Nan | Current Stuff | N/A - still employed | 34445 |
| 4 | Admin Offices | Administrative Assistant | 3 | 711007713 | Smith, Leigh Ann | Voluntarily Terminated | career change | 42640 |
| 5 | Admin Offices | Administrative Assistant | 3 | 1211050782 | Howard, Estelle | Current Stuff | N/A - still employed | 44720 |
| 6 | Admin Offices | Shared Services Manager | 2 | 1102024115 | LeBlanc, Brandon R | Current Stuff | N/A - still employed | 114400 |
| 7 | Admin Offices | Shared Services Manager | 2 | 1206043417 | Quinn, Sean | Voluntarily Terminated | career change | 114400 |
| 8 | Admin Offices | Sr. Accountant | 2 | 1201031308 | Foster-Baker, Amy | Current Stuff | N/A - still employed | 72696 |
| 9 | Admin Offices | Sr. Accountant | 2 | 1307060188 | Boutwell, Bonalyn | Current Stuff | N/A - still employed | 72696 |
| 10 | Executive Office | President & CEO | 1 | 1001495124 | King, Janet | Current Stuff | N/A - still employed | 166400 |
| 11 | IT/IS | BI Developer | 4 | 1009919940 | Rachael, Maggie | Current Stuff | N/A - still employed | 93600 |
| 12 | IT/IS | BI Developer | 4 | 1009919990 | Westinghouse, Matthew | Current Stuff | N/A - still employed | 93600 |
| 13 | IT/IS | BI Developer | 4 | 1009920000 | Hubert, Robert | Current Stuff | N/A - still employed | 93600 |
| 14 | IT/IS | BI Developer | 4 | 1009919980 | Smith, Jason | Current Stuff | N/A - still employed | 95680 |
| 15 | IT/IS | BI Director | 1 | 1009919920 | Champaigne, Brian | Current Stuff | N/A - still employed | 132080 |
| 16 | IT/IS | CIO | 1 | 1112030816 | Zamora, Jennifer | Current Stuff | N/A - still employed | 135200 |
| 17 | IT/IS | Data Architect | 1 | 1009919950 | Roper, Katie | Current Stuff | N/A - still employed | 114400 |
| 18 | IT/IS | Database Administrator | 13 | 808010278 | Simard, Kramer | Current Stuff | N/A - still employed | 62816 |
| 19 | IT/IS | Database Administrator | 13 | 1110029732 | Zhou, Julia | Current Stuff | N/A - still employed | 65312 |
| 20 | IT/IS | Database Administrator | 13 | 1105025718 | Horton, Jayne | Current Stuff | N/A - still employed | 70720 |
| 21 | IT/IS | Database Administrator | 13 | 1406068403 | Murray, Thomas | Current Stuff | N/A - still employed | 73840 |
| 22 | IT/IS | Database Administrator | 13 | 1407068885 | Roby, Lori | Current Stuff | N/A - still employed | 82264 |
| 23 | IT/IS | Database Administrator | 13 | 1003018246 | Johnson, Noelle | Current Stuff | N/A - still employed | 83200 |
| 24 | IT/IS | Database Administrator | 13 | 1410071156 | Hernandez, Daniff | Terminated for Cause | no-call, no-show | 83408 |
| 25 | IT/IS | Database Administrator | 13 | 1102023965 | Pearson, Randall | Voluntarily Terminated | performance | 85280 |
| 26 | IT/IS | Database Administrator | 13 | 1203032255 | Rogers, Ivan | Current Stuff | N/A - still employed | 87776 |
| 27 | IT/IS | Database Administrator | 13 | 1108027853 | Petrowsky, Thelma | Current Stuff | N/A - still employed | 88920 |
| 28 | IT/IS | Database Administrator | 13 | 1102024056 | Becker, Renee | Terminated for Cause | performance | 89440 |
| 29 | IT/IS | Database Administrator | 13 | 1111030148 | Salter, Jason | Voluntarily Terminated | hours | 93600 |
| 30 | IT/IS | Database Administrator | 13 | 905013738 | Goble, Taisha | Terminated for Cause | no-call, no-show | 100880 |
| 31 | IT/IS | IT Director | 1 | 1192991000 | Foss, Jason | Current Stuff | N/A - still employed | 135200 |
| 32 | IT/IS | IT Manager - DB | 2 | 1001175250 | Ruiz, Ricardo | Voluntarily Terminated | hours | 43680 |
| 33 | IT/IS | IT Manager - DB | 2 | 1106026933 | Roup,Simon | Current Stuff | N/A - still employed | 128960 |
| 34 | IT/IS | IT Manager - Infra | 1 | 1011022863 | Monroe, Peter | Current Stuff | N/A - still employed | 131040 |
| 35 | IT/IS | IT Manager - Support | 1 | 1101023754 | Dougall, Eric | Current Stuff | N/A - still employed | 133120 |
| 36 | IT/IS | IT Support | 4 | 602000312 | Lindsay, Leonara | Current Stuff | N/A - still employed | 54080 |
| 37 | IT/IS | IT Support | 4 | 1203032263 | Soto, Julia | Current Stuff | N/A - still employed | 57179 |
| 38 | IT/IS | IT Support | 4 | 1301052902 | Clayton, Rick | Current Stuff | N/A - still employed | 60299 |
| 39 | IT/IS | IT Support | 4 | 1501072093 | Galia, Lisa | Current Stuff | N/A - still employed | 65312 |
| 40 | IT/IS | Network Engineer | 9 | 1001956578 | Morway, Tanya | Current Stuff | N/A - still employed | 56160 |
| 41 | IT/IS | Network Engineer | 9 | 1104025466 | Tredinnick, Neville | Voluntarily Terminated | medical issues | 58240 |
| 42 | IT/IS | Network Engineer | 9 | 1101023540 | Dolan, Linda | Current Stuff | N/A - still employed | 76960 |
| 43 | IT/IS | Network Engineer | 9 | 1988299991 | Gonzalez, Maria | Current Stuff | N/A - still employed | 81120 |
| 44 | IT/IS | Network Engineer | 9 | 1102024173 | Cisco, Anthony | Current Stuff | N/A - still employed | 87360 |
| 45 | IT/IS | Network Engineer | 9 | 1012023013 | Merlos, Carlos | Current Stuff | N/A - still employed | 89440 |
| 46 | IT/IS | Network Engineer | 9 | 1212052023 | Bacong, Alejandro | Current Stuff | N/A - still employed | 93600 |
| 47 | IT/IS | Network Engineer | 9 | 906014183 | Shepard, Anita | Current Stuff | N/A - still employed | 97760 |
| 48 | IT/IS | Network Engineer | 9 | 1411071506 | Turpin, Jumil | Current Stuff | N/A - still employed | 102128 |
| 49 | IT/IS | Senior BI Developer | 3 | 1009919930 | Le, Binh | Current Stuff | N/A - still employed | 104520 |
| 50 | IT/IS | Senior BI Developer | 3 | 1009919970 | Wang, Charlie | Current Stuff | N/A - still employed | 106080 |
| 51 | IT/IS | Senior BI Developer | 3 | 1009919960 | Navathe, Kurt | Current Stuff | N/A - still employed | 108680 |
| 52 | IT/IS | Sr. DBA | 4 | 1412071562 | Favis, Donald | Terminated for Cause | hours | 121056 |
| 53 | IT/IS | Sr. DBA | 4 | 1111030266 | Roehrich, Bianca | Voluntarily Terminated | Another position | 121680 |
| 54 | IT/IS | Sr. DBA | 4 | 1010022337 | Carr, Claudia N | Current Stuff | N/A - Has not started yet | 127504 |
| 55 | IT/IS | Sr. DBA | 4 | 1307060199 | Ait Sidi, Karthikeyan | Voluntarily Terminated | career change | 128960 |
| 56 | IT/IS | Sr. Network Engineer | 5 | 1308060959 | South, Joe | Current Stuff | N/A - still employed | 110240 |
| 57 | IT/IS | Sr. Network Engineer | 5 | 904013591 | Semizoglou, Jeremiah | Current Stuff | N/A - Has not started yet | 111904 |
| 58 | IT/IS | Sr. Network Engineer | 5 | 1411071312 | Daniele, Ann | Current Stuff | N/A - still employed | 112528 |
| 59 | IT/IS | Sr. Network Engineer | 5 | 1301052347 | Warfield, Sarah | Current Stuff | N/A - still employed | 114816 |
| 60 | IT/IS | Sr. Network Engineer | 5 | 1108028108 | Lajiri, Jyoti | Current Stuff | N/A - still employed | 116896 |
| 61 | Production | Director of Operations | 1 | 1006020066 | Bramante, Elisa | Current Stuff | N/A - still employed | 124800 |
| 62 | Production | Production Manager | 14 | 1410071026 | Wallace, Courtney E | Voluntarily Terminated | Another position | 69680 |
| 63 | Production | Production Manager | 14 | 1402065355 | Peterson, Ebonee | Voluntarily Terminated | Another position | 80080 |
| 64 | Production | Production Manager | 14 | 1403065874 | Immediato, Walter | Voluntarily Terminated | unhappy | 87360 |
| 65 | Production | Production Manager | 14 | 1001944783 | Hogland, Jonathan | Terminated for Cause | attendance | 100880 |
| 66 | Production | Production Manager | 14 | 1303054580 | Bozzi, Charles | Voluntarily Terminated | retiring | 105040 |
| 67 | Production | Production Manager | 14 | 1409070147 | Dunn, Amy | Current Stuff | N/A - still employed | 106080 |
| 68 | Production | Production Manager | 14 | 1102024149 | Spirea, Kelley | Current Stuff | N/A - still employed | 108160 |
| 69 | Production | Production Manager | 14 | 1000974650 | Stanley, David | Current Stuff | N/A - still employed | 110240 |
| 70 | Production | Production Manager | 14 | 1107027351 | Miller, Brannon | Current Stuff | N/A - still employed | 110240 |
| 71 | Production | Production Manager | 14 | 1307060077 | Gray, Elijiah | Current Stuff | N/A - still employed | 112320 |
| 72 | Production | Production Manager | 14 | 1501072311 | Albert, Michael | Current Stuff | N/A - still employed | 113360 |
| 73 | Production | Production Manager | 14 | 1103024679 | Liebig, Ketsia | Current Stuff | N/A - still employed | 114400 |
| 74 | Production | Production Manager | 14 | 1110029990 | Butler, Webster L | Current Stuff | N/A - still employed | 114400 |
| 75 | Production | Production Manager | 14 | 1405067298 | Sullivan, Kissy | Current Stuff | N/A - still employed | 114400 |
| 76 | Production | Production Technician I | 136 | 1103024859 | Gross, Paula | Voluntarily Terminated | more money | 29120 |
| 77 | Production | Production Technician I | 136 | 1111030244 | Stanford,Barbara M | Current Stuff | N/A - still employed | 29120 |
| 78 | Production | Production Technician I | 136 | 1304055683 | Knapp, Bradley J | Current Stuff | N/A - still employed | 29120 |
| 79 | Production | Production Technician I | 136 | 1407069061 | Sutwell, Barbara | Current Stuff | N/A - still employed | 29120 |
| 80 | Production | Production Technician I | 136 | 1409070245 | Meads, Elizabeth | Voluntarily Terminated | Another position | 29120 |
| 81 | Production | Production Technician I | 136 | 1001109612 | Darson, Jene'ya | Current Stuff | N/A - still employed | 31200 |
| 82 | Production | Production Technician I | 136 | 1001450968 | Cole, Spencer | Terminated for Cause | performance | 31200 |
| 83 | Production | Production Technician I | 136 | 1008020960 | Gilles, Alex | Voluntarily Terminated | military | 31200 |
| 84 | Production | Production Technician I | 136 | 1101023619 | Wallace, Theresa | Voluntarily Terminated | career change | 31200 |
| 85 | Production | Production Technician I | 136 | 1102024121 | Motlagh, Dawn | Current Stuff | N/A - still employed | 31200 |
| 86 | Production | Production Technician I | 136 | 1106026579 | Gordon, David | Current Stuff | N/A - still employed | 31200 |
| 87 | Production | Production Technician I | 136 | 1107027575 | LeBel, Jonathan R | Terminated for Cause | attendance | 31200 |
| 88 | Production | Production Technician I | 136 | 1201031032 | MacLennan, Samuel | Voluntarily Terminated | hours | 31200 |
| 89 | Production | Production Technician I | 136 | 1205033102 | Shields, Seffi | Current Stuff | N/A - still employed | 31200 |
| 90 | Production | Production Technician I | 136 | 1211051232 | Zima, Colleen | Current Stuff | N/A - still employed | 31200 |
| 91 | Production | Production Technician I | 136 | 1403066020 | Ngodup, Shari | Current Stuff | N/A - still employed | 31200 |
| 92 | Production | Production Technician I | 136 | 1405067501 | Tavares, Desiree | Voluntarily Terminated | Another position | 31200 |
| 93 | Production | Production Technician I | 136 | 1599991009 | Cockel, James | Current Stuff | N/A - still employed | 31200 |
| 94 | Production | Production Technician I | 136 | 807010161 | Sewkumar, Nori | Current Stuff | N/A - still employed | 31616 |
| 95 | Production | Production Technician I | 136 | 1206042315 | Linares, Marilyn | Voluntarily Terminated | unhappy | 31720 |
| 96 | Production | Production Technician I | 136 | 1403066069 | Cornett, Lisa | Current Stuff | N/A - still employed | 32760 |
| 97 | Production | Production Technician I | 136 | 1001417624 | Anderson, Carol | Voluntarily Terminated | return to school | 33280 |
| 98 | Production | Production Technician I | 136 | 1011022926 | Purinton, Janine | Voluntarily Terminated | unhappy | 33280 |
| 99 | Production | Production Technician I | 136 | 1109029366 | Bernstein, Sean | Current Stuff | N/A - still employed | 33280 |
| 100 | Production | Production Technician I | 136 | 1204032927 | Girifalco, Evelyn | Current Stuff | N/A - still employed | 33280 |
| 101 | Production | Production Technician I | 136 | 1208048062 | Chace, Beatrice | Current Stuff | N/A - still employed | 33280 |
| 102 | Production | Production Technician I | 136 | 1308060366 | Billis, Helen | Current Stuff | N/A - still employed | 33280 |
| 103 | Production | Production Technician I | 136 | 1308060671 | Williams, Jacquelyn | Voluntarily Terminated | relocation out of area | 33280 |
| 104 | Production | Production Technician I | 136 | 1401064562 | Punjabhi, Louis | Current Stuff | N/A - still employed | 33280 |
| 105 | Production | Production Technician I | 136 | 1403065625 | Robinson, Cherly | Terminated for Cause | attendance | 33280 |
| 106 | Production | Production Technician I | 136 | 1404066949 | Osturnka, Adeel | Current Stuff | N/A - still employed | 33280 |
| 107 | Production | Production Technician I | 136 | 1408069882 | Ivey, Rose | Current Stuff | N/A - still employed | 33280 |
| 108 | Production | Production Technician I | 136 | 1410071137 | Sparks, Taylor | Current Stuff | N/A - still employed | 33280 |
| 109 | Production | Production Technician I | 136 | 1411071212 | Gonzalez, Cayo | Current Stuff | N/A - still employed | 33280 |
| 110 | Production | Production Technician I | 136 | 1202031618 | Dobrin, Denisa S | Current Stuff | N/A - still employed | 34840 |
| 111 | Production | Production Technician I | 136 | 1101023679 | Barone, Francesco A | Current Stuff | N/A - still employed | 34861 |
| 112 | Production | Production Technician I | 136 | 710007555 | Rose, Ashley | Current Stuff | N/A - still employed | 35360 |
| 113 | Production | Production Technician I | 136 | 1001735072 | Pitt, Brad | Current Stuff | N/A - still employed | 35360 |
| 114 | Production | Production Technician I | 136 | 1011022887 | Robinson, Elias | Current Stuff | N/A - still employed | 35360 |
| 115 | Production | Production Technician I | 136 | 1104025243 | Estremera, Miguel | Terminated for Cause | attendance | 35360 |
| 116 | Production | Production Technician I | 136 | 1105025721 | Johnson, George | Voluntarily Terminated | more money | 35360 |
| 117 | Production | Production Technician I | 136 | 1110029777 | Becker, Scott | Current Stuff | N/A - still employed | 35360 |
| 118 | Production | Production Technician I | 136 | 1209049259 | Mahoney, Lauren | Current Stuff | N/A - still employed | 35360 |
| 119 | Production | Production Technician I | 136 | 1304055987 | Langton, Enrico | Current Stuff | N/A - still employed | 35360 |
| 120 | Production | Production Technician I | 136 | 1307059944 | Peterson, Kayla | Current Stuff | N/A - still employed | 35360 |
| 121 | Production | Production Technician I | 136 | 1307060083 | Baczenski, Rachael | Voluntarily Terminated | Another position | 35360 |
| 122 | Production | Production Technician I | 136 | 1406067865 | Eaton, Marianne | Voluntarily Terminated | military | 35360 |
| 123 | Production | Production Technician I | 136 | 1408069539 | Gold, Shenice | Current Stuff | N/A - still employed | 35360 |
| 124 | Production | Production Technician I | 136 | 1104025414 | Garneau, Hamish | Current Stuff | N/A - still employed | 37440 |
| 125 | Production | Production Technician I | 136 | 1107027392 | Evensen, April | Terminated for Cause | no-call, no-show | 37440 |
| 126 | Production | Production Technician I | 136 | 1109029531 | Panjwani, Nina | Voluntarily Terminated | Another position | 37440 |
| 127 | Production | Production Technician I | 136 | 1204033041 | Ndzi, Colombui | Voluntarily Terminated | return to school | 37440 |
| 128 | Production | Production Technician I | 136 | 1212051962 | Barton, Nader | Voluntarily Terminated | Another position | 37440 |
| 129 | Production | Production Technician I | 136 | 1302053339 | Fernandes, Nilson | Current Stuff | N/A - still employed | 37440 |
| 130 | Production | Production Technician I | 136 | 1305057440 | Pham, Hong | Voluntarily Terminated | more money | 37440 |
| 131 | Production | Production Technician I | 136 | 1405067492 | Rossetti, Bruno | Voluntarily Terminated | Another position | 37440 |
| 132 | Production | Production Technician I | 136 | 1206044851 | O'hare, Lynn | Terminated for Cause | performance | 38480 |
| 133 | Production | Production Technician I | 136 | 1001138521 | Lynch, Lindsay | Voluntarily Terminated | Another position | 39520 |
| 134 | Production | Production Technician I | 136 | 1002017900 | Heitzman, Anthony | Current Stuff | N/A - still employed | 39520 |
| 135 | Production | Production Technician I | 136 | 1102024274 | Power, Morissa | Voluntarily Terminated | Another position | 39520 |
| 136 | Production | Production Technician I | 136 | 1201031310 | Sullivan, Timothy | Current Stuff | N/A - still employed | 39520 |
| 137 | Production | Production Technician I | 136 | 1203032235 | Veera, Abdellah | Voluntarily Terminated | maternity leave - did not return | 39520 |
| 138 | Production | Production Technician I | 136 | 1203032357 | Nguyen, Lei-Ming | Current Stuff | N/A - still employed | 39520 |
| 139 | Production | Production Technician I | 136 | 1301052462 | Keatts, Kramer | Current Stuff | N/A - still employed | 39520 |
| 140 | Production | Production Technician I | 136 | 1309061015 | Garcia, Raul | Current Stuff | N/A - still employed | 39520 |
| 141 | Production | Production Technician I | 136 | 1408069409 | Leach, Dallas | Voluntarily Terminated | return to school | 39520 |
| 142 | Production | Production Technician I | 136 | 1412071713 | Jhaveri, Sneha | Current Stuff | N/A - still employed | 39520 |
| 143 | Production | Production Technician I | 136 | 1501072192 | Gentry, Mildred | Current Stuff | N/A - still employed | 39520 |
| 144 | Production | Production Technician I | 136 | 1305057282 | Chan, Lin | Current Stuff | N/A - still employed | 40560 |
| 145 | Production | Production Technician I | 136 | 1110029602 | Harrington, Christie | Voluntarily Terminated | retiring | 41080 |
| 146 | Production | Production Technician I | 136 | 1311063172 | Crimmings, Jean | Current Stuff | N/A - Has not started yet | 41080 |
| 147 | Production | Production Technician I | 136 | 1101023353 | Lydon, Allison | Current Stuff | N/A - still employed | 41600 |
| 148 | Production | Production Technician I | 136 | 1103024504 | Tinto, Theresa | Voluntarily Terminated | Another position | 41600 |
| 149 | Production | Production Technician I | 136 | 1106026474 | Von Massenbach, Anna | Current Stuff | N/A - Has not started yet | 41600 |
| 150 | Production | Production Technician I | 136 | 1109029256 | Medeiros, Jennifer | Current Stuff | N/A - still employed | 41600 |
| 151 | Production | Production Technician I | 136 | 1201031438 | Jackson, Maryellen | Current Stuff | N/A - still employed | 41600 |
| 152 | Production | Production Technician I | 136 | 1211050793 | Rhoads, Thomas | Voluntarily Terminated | retiring | 41600 |
| 153 | Production | Production Technician I | 136 | 1311063114 | Carey, Michael | Current Stuff | N/A - still employed | 41600 |
| 154 | Production | Production Technician I | 136 | 1401064327 | Maurice, Shana | Current Stuff | N/A - still employed | 41600 |
| 155 | Production | Production Technician I | 136 | 1404066622 | Harrison, Kara | Current Stuff | N/A - still employed | 41600 |
| 156 | Production | Production Technician I | 136 | 1404066739 | Theamstern, Sophia | Voluntarily Terminated | return to school | 41600 |
| 157 | Production | Production Technician I | 136 | 1406068293 | Brill, Donna | Voluntarily Terminated | Another position | 41600 |
| 158 | Production | Production Technician I | 136 | 1408069635 | Bugali, Josephine | Current Stuff | N/A - still employed | 41600 |
| 159 | Production | Production Technician I | 136 | 1409070522 | Adinolfi, Wilson K | Current Stuff | N/A - still employed | 41600 |
| 160 | Production | Production Technician I | 136 | 1410070998 | Saar-Beckles, Melinda | Current Stuff | N/A - Has not started yet | 41600 |
| 161 | Production | Production Technician I | 136 | 1501072124 | Desimone, Carl | Current Stuff | N/A - still employed | 41600 |
| 162 | Production | Production Technician I | 136 | 1502072511 | Ferguson, Susan | Voluntarily Terminated | military | 41600 |
| 163 | Production | Production Technician I | 136 | 706006285 | Dickinson, Geoff | Current Stuff | N/A - still employed | 43680 |
| 164 | Production | Production Technician I | 136 | 1011022814 | Volk, Colleen | Terminated for Cause | gross misconduct | 43680 |
| 165 | Production | Production Technician I | 136 | 1011022883 | Alagbe,Trina | Current Stuff | N/A - still employed | 43680 |
| 166 | Production | Production Technician I | 136 | 1101023394 | Chivukula, Enola | Voluntarily Terminated | relocation out of area | 43680 |
| 167 | Production | Production Technician I | 136 | 1101023612 | England, Rex | Current Stuff | N/A - still employed | 43680 |
| 168 | Production | Production Technician I | 136 | 1101023839 | Wilber, Barry | Voluntarily Terminated | unhappy | 43680 |
| 169 | Production | Production Technician I | 136 | 1107027450 | Jung, Judy | Voluntarily Terminated | unhappy | 43680 |
| 170 | Production | Production Technician I | 136 | 1208048229 | Rarrick, Quinn | Voluntarily Terminated | more money | 43680 |
| 171 | Production | Production Technician I | 136 | 1302053044 | Newman, Richard | Current Stuff | N/A - still employed | 43680 |
| 172 | Production | Production Technician I | 136 | 1302053362 | Sander, Kamrin | Current Stuff | N/A - still employed | 43680 |
| 173 | Production | Production Technician I | 136 | 1308060535 | Sadki, Nore | Voluntarily Terminated | relocation out of area | 43680 |
| 174 | Production | Production Technician I | 136 | 1311062610 | Kretschmer, John | Current Stuff | N/A - still employed | 43680 |
| 175 | Production | Production Technician I | 136 | 1405067138 | Squatrito, Kristen | Voluntarily Terminated | unhappy | 43680 |
| 176 | Production | Production Technician I | 136 | 1406068241 | Harrell, Ludwick | Current Stuff | N/A - still employed | 43680 |
| 177 | Production | Production Technician I | 136 | 1503072857 | Jacobi, Hannah | Current Stuff | N/A - still employed | 43680 |
| 178 | Production | Production Technician I | 136 | 1007020403 | Engdahl, Jean | Current Stuff | N/A - still employed | 44200 |
| 179 | Production | Production Technician I | 136 | 710007401 | Kinsella, Kathleen | Voluntarily Terminated | more money | 45760 |
| 180 | Production | Production Technician I | 136 | 1001268402 | Ybarra, Catherine | Voluntarily Terminated | Another position | 45760 |
| 181 | Production | Production Technician I | 136 | 1012023152 | Trang, Mei | Current Stuff | N/A - still employed | 45760 |
| 182 | Production | Production Technician I | 136 | 1012023295 | Cierpiszewski, Caroline | Current Stuff | N/A - still employed | 45760 |
| 183 | Production | Production Technician I | 136 | 1103024335 | Owad, Clinton | Current Stuff | N/A - still employed | 45760 |
| 184 | Production | Production Technician I | 136 | 1106026896 | Stoica, Rick | Current Stuff | N/A - still employed | 45760 |
| 185 | Production | Production Technician I | 136 | 1109029186 | Perry, Shakira | Voluntarily Terminated | medical issues | 45760 |
| 186 | Production | Production Technician I | 136 | 1111030129 | Chang, Donovan E | Current Stuff | N/A - still employed | 45760 |
| 187 | Production | Production Technician I | 136 | 1204033041 | Ndzi, Horia | Voluntarily Terminated | more money | 45760 |
| 188 | Production | Production Technician I | 136 | 1209048696 | DiNocco, Lily | Current Stuff | N/A - still employed | 45760 |
| 189 | Production | Production Technician I | 136 | 1212051409 | Bachiochi, Linda | Current Stuff | N/A - still employed | 45760 |
| 190 | Production | Production Technician I | 136 | 1301052124 | Athwal, Sam | Current Stuff | N/A - still employed | 45760 |
| 191 | Production | Production Technician I | 136 | 1308060622 | Gerke, Melisa | Voluntarily Terminated | hours | 45760 |
| 192 | Production | Production Technician I | 136 | 1312063507 | Barbara, Thomas | Voluntarily Terminated | unhappy | 45760 |
| 193 | Production | Production Technician I | 136 | 1403066194 | Beatrice, Courtney | Current Stuff | N/A - still employed | 45760 |
| 194 | Production | Production Technician I | 136 | 1405067642 | Rivera, Haley | Current Stuff | N/A - still employed | 45760 |
| 195 | Production | Production Technician I | 136 | 1412071844 | Biden, Lowan M | Current Stuff | N/A - still employed | 45760 |
| 196 | Production | Production Technician I | 136 | 803009012 | Ferreira, Violeta | Current Stuff | N/A - still employed | 47840 |
| 197 | Production | Production Technician I | 136 | 1304055947 | Anderson, Linda | Current Stuff | N/A - still employed | 47840 |
| 198 | Production | Production Technician I | 136 | 1307060212 | Whittier, Scott | Voluntarily Terminated | hours | 47840 |
| 199 | Production | Production Technician I | 136 | 1308060754 | Mangal, Debbie | Current Stuff | N/A - still employed | 47840 |
| 200 | Production | Production Technician I | 136 | 1306058816 | DeGweck, James | Voluntarily Terminated | unhappy | 48880 |
| 201 | Production | Production Technician I | 136 | 903013071 | Kirill, Alexandra | Voluntarily Terminated | more money | 49920 |
| 202 | Production | Production Technician I | 136 | 909015167 | Mckenna, Sandy | Current Stuff | N/A - still employed | 49920 |
| 203 | Production | Production Technician I | 136 | 1006020020 | Fidelia, Libby | Current Stuff | N/A - still employed | 49920 |
| 204 | Production | Production Technician I | 136 | 1105026041 | Gaul, Barbara | Current Stuff | N/A - still employed | 49920 |
| 205 | Production | Production Technician I | 136 | 1206038000 | Robinson, Alain | Voluntarily Terminated | attendance | 49920 |
| 206 | Production | Production Technician I | 136 | 1307060058 | Pelech, Emil | Voluntarily Terminated | career change | 49920 |
| 207 | Production | Production Technician I | 136 | 1312063675 | Goyal, Roxana | Current Stuff | N/A - still employed | 49920 |
| 208 | Production | Production Technician I | 136 | 1405067064 | Handschiegl, Joanne | Current Stuff | N/A - still employed | 49920 |
| 209 | Production | Production Technician I | 136 | 1409070255 | Tippett, Jeanette | Current Stuff | N/A - still employed | 49920 |
| 210 | Production | Production Technician I | 136 | 1501071909 | Smith, Sade | Current Stuff | N/A - still employed | 50960 |
| 211 | Production | Production Technician I | 136 | 1407069280 | Clukey, Elijian | Current Stuff | N/A - Has not started yet | 51480 |
| 212 | Production | Production Technician II | 57 | 1001970770 | Smith, Joe | Current Stuff | N/A - still employed | 45760 |
| 213 | Production | Production Technician II | 57 | 1010022030 | Langford, Lindsey | Voluntarily Terminated | Another position | 45760 |
| 214 | Production | Production Technician II | 57 | 1011022818 | Walker, Roger | Current Stuff | N/A - still employed | 45760 |
| 215 | Production | Production Technician II | 57 | 1101023457 | Buccheri, Joseph | Current Stuff | N/A - still employed | 45760 |
| 216 | Production | Production Technician II | 57 | 1107027551 | Trzeciak, Cybil | Voluntarily Terminated | unhappy | 45760 |
| 217 | Production | Production Technician II | 57 | 1205033180 | Wolk, Hang T | Current Stuff | N/A - still employed | 45760 |
| 218 | Production | Production Technician II | 57 | 1405067565 | Linden, Mathew | Current Stuff | N/A - still employed | 45760 |
| 219 | Production | Production Technician II | 57 | 1499902991 | Robertson, Peter | Voluntarily Terminated | Another position | 45760 |
| 220 | Production | Production Technician II | 57 | 1008020942 | Jeannite, Tayana | Current Stuff | N/A - still employed | 46800 |
| 221 | Production | Production Technician II | 57 | 1011022777 | Thibaud, Kenneth | Voluntarily Terminated | military | 47840 |
| 222 | Production | Production Technician II | 57 | 1012023103 | Sloan, Constance | Voluntarily Terminated | maternity leave - did not return | 47840 |
| 223 | Production | Production Technician II | 57 | 1110029623 | Manchester, Robyn | Current Stuff | N/A - Has not started yet | 47840 |
| 224 | Production | Production Technician II | 57 | 1304055986 | Mancuso, Karen | Voluntarily Terminated | Another position | 47840 |
| 225 | Production | Production Technician II | 57 | 1306058509 | Huynh, Ming | Voluntarily Terminated | unhappy | 47840 |
| 226 | Production | Production Technician II | 57 | 1011022932 | Hendrickson, Trina | Voluntarily Terminated | hours | 49920 |
| 227 | Production | Production Technician II | 57 | 1012023204 | Foreman, Tanya | Voluntarily Terminated | career change | 49920 |
| 228 | Production | Production Technician II | 57 | 1105025661 | Erilus, Angela | Current Stuff | N/A - still employed | 49920 |
| 229 | Production | Production Technician II | 57 | 1305056276 | Lundy, Susan | Voluntarily Terminated | more money | 49920 |
| 230 | Production | Production Technician II | 57 | 1307059937 | Hankard, Earnest | Current Stuff | N/A - still employed | 49920 |
| 231 | Production | Production Technician II | 57 | 1402065085 | Fancett, Nicole | Current Stuff | N/A - still employed | 49920 |
| 232 | Production | Production Technician II | 57 | 1001549006 | Good, Susan | Current Stuff | N/A - still employed | 50440 |
| 233 | Production | Production Technician II | 57 | 1012023010 | Woodson, Jason | Current Stuff | N/A - still employed | 50440 |
| 234 | Production | Production Technician II | 57 | 1001103149 | Monterro, Luisa | Current Stuff | N/A - still employed | 52000 |
| 235 | Production | Production Technician II | 57 | 1001856521 | Oliver, Brooke | Voluntarily Terminated | unhappy | 52000 |
| 236 | Production | Production Technician II | 57 | 1011022820 | Burke, Joelle | Current Stuff | N/A - still employed | 52000 |
| 237 | Production | Production Technician II | 57 | 1012023226 | Cloninger, Jennifer | Voluntarily Terminated | unhappy | 52000 |
| 238 | Production | Production Technician II | 57 | 1106026433 | Hunts, Julissa | Current Stuff | N/A - Has not started yet | 52000 |
| 239 | Production | Production Technician II | 57 | 1201031274 | Davis, Daniel | Current Stuff | N/A - still employed | 52000 |
| 240 | Production | Production Technician II | 57 | 1205033439 | Miller, Ned | Voluntarily Terminated | unhappy | 52000 |
| 241 | Production | Production Technician II | 57 | 1306057810 | Johnston, Yen | Current Stuff | N/A - still employed | 52000 |
| 242 | Production | Production Technician II | 57 | 1008021030 | Bondwell, Betsy | Voluntarily Terminated | career change | 54080 |
| 243 | Production | Production Technician II | 57 | 1103024843 | Petingill, Shana | Current Stuff | N/A - still employed | 54080 |
| 244 | Production | Production Technician II | 57 | 1209048697 | Close, Phil | Voluntarily Terminated | career change | 54080 |
| 245 | Production | Production Technician II | 57 | 1301052449 | Burkett, Benjamin | Current Stuff | N/A - still employed | 54080 |
| 246 | Production | Production Technician II | 57 | 1402065340 | Roberson, May | Voluntarily Terminated | return to school | 54080 |
| 247 | Production | Production Technician II | 57 | 1406067957 | McCarthy, Brigit | Current Stuff | N/A - still employed | 54080 |
| 248 | Production | Production Technician II | 57 | 1408069503 | Moran, Patrick | Current Stuff | N/A - still employed | 54080 |
| 249 | Production | Production Technician II | 57 | 1001504432 | Lunquist, Lisa | Current Stuff | N/A - still employed | 54288 |
| 250 | Production | Production Technician II | 57 | 1104025435 | Nowlan, Kristie | Current Stuff | N/A - still employed | 54891 |
| 251 | Production | Production Technician II | 57 | 1108028351 | Gosciminski, Phylicia | Current Stuff | N/A - still employed | 56160 |
| 252 | Production | Production Technician II | 57 | 1108028428 | Faller, Megan | Current Stuff | N/A - still employed | 56160 |
| 253 | Production | Production Technician II | 57 | 1303054329 | Beak, Kimberly | Current Stuff | N/A - Has not started yet | 56160 |
| 254 | Production | Production Technician II | 57 | 1403066125 | Blount, Dianna | Current Stuff | N/A - still employed | 56160 |
| 255 | Production | Production Technician II | 57 | 1404066711 | Monkfish, Erasumus | Current Stuff | N/A - still employed | 56160 |
| 256 | Production | Production Technician II | 57 | 1103024924 | Hutter, Rosalie | Current Stuff | N/A - Has not started yet | 58240 |
| 257 | Production | Production Technician II | 57 | 1104025486 | Latif, Mohammed | Voluntarily Terminated | more money | 58240 |
| 258 | Production | Production Technician II | 57 | 1202031821 | Pelletier, Ermine | Voluntarily Terminated | unhappy | 58240 |
| 259 | Production | Production Technician II | 57 | 1207046956 | Homberger, Adrienne J | Voluntarily Terminated | relocation out of area | 58240 |
| 260 | Production | Production Technician II | 57 | 1406068345 | Tejeda, Lenora | Voluntarily Terminated | Another position | 59800 |
| 261 | Production | Production Technician II | 57 | 1005019209 | Akinkuolie, Sarah | Voluntarily Terminated | hours | 60320 |
| 262 | Production | Production Technician II | 57 | 1104025179 | Demita, Carla | Voluntarily Terminated | more money | 60320 |
| 263 | Production | Production Technician II | 57 | 1106026462 | Sahoo, Adil | Current Stuff | N/A - still employed | 60320 |
| 264 | Production | Production Technician II | 57 | 1109029103 | Fitzpatrick, Michael J | Voluntarily Terminated | hours | 60320 |
| 265 | Production | Production Technician II | 57 | 1301052436 | Moumanil, Maliki | Current Stuff | N/A - still employed | 60320 |
| 266 | Production | Production Technician II | 57 | 1405067188 | Winthrop, Jordan | Voluntarily Terminated | retiring | 60320 |
| 267 | Production | Production Technician II | 57 | 1411071324 | Gonzalez, Juan | Voluntarily Terminated | career change | 60320 |
| 268 | Production | Production Technician II | 57 | 1411071406 | Peters, Lauren | Voluntarily Terminated | more money | 60320 |
| 269 | Sales | Area Sales Manager | 27 | 1411071295 | Strong, Caitrin | Current Stuff | N/A - still employed | 112320 |
| 270 | Sales | Area Sales Manager | 27 | 812011761 | Ozark, Travis | Current Stuff | N/A - still employed | 114400 |
| 271 | Sales | Area Sales Manager | 27 | 1001167253 | Guilianno, Mike | Voluntarily Terminated | relocation out of area | 114400 |
| 272 | Sales | Area Sales Manager | 27 | 1102024106 | Potts, Xana | Current Stuff | N/A - still employed | 114400 |
| 273 | Sales | Area Sales Manager | 27 | 1104025008 | Khemmich, Bartholemew | Current Stuff | N/A - still employed | 114400 |
| 274 | Sales | Area Sales Manager | 27 | 1111030684 | Nguyen, Dheepa | Current Stuff | N/A - still employed | 114400 |
| 275 | Sales | Area Sales Manager | 27 | 1203032099 | Givens, Myriam | Current Stuff | N/A - still employed | 114400 |
| 276 | Sales | Area Sales Manager | 27 | 1209049326 | McKinzie, Jac | Current Stuff | N/A - Has not started yet | 114400 |
| 277 | Sales | Area Sales Manager | 27 | 1302053046 | Gill, Whitney | Terminated for Cause | attendance | 114400 |
| 278 | Sales | Area Sales Manager | 27 | 1306057978 | Mullaney, Howard | Current Stuff | N/A - still employed | 114400 |
| 279 | Sales | Area Sales Manager | 27 | 1312063714 | Valentin,Jackie | Current Stuff | N/A - still employed | 114400 |
| 280 | Sales | Area Sales Manager | 27 | 1401064637 | Terry, Sharlene | Current Stuff | N/A - still employed | 114400 |
| 281 | Sales | Area Sales Manager | 27 | 1403065721 | Carter, Michelle | Current Stuff | N/A - still employed | 114400 |
| 282 | Sales | Area Sales Manager | 27 | 1408069481 | Dietrich, Jenna | Current Stuff | N/A - still employed | 114400 |
| 283 | Sales | Area Sales Manager | 27 | 1409070567 | Costa, Latia | Current Stuff | N/A - still employed | 114400 |
| 284 | Sales | Area Sales Manager | 27 | 1411071302 | Fraval, Maruk | Current Stuff | N/A - still employed | 114400 |
| 285 | Sales | Area Sales Manager | 27 | 1412071660 | Leruth, Giovanni | Current Stuff | N/A - still employed | 114400 |
| 286 | Sales | Area Sales Manager | 27 | 1502072711 | Riordan, Michael | Current Stuff | N/A - still employed | 114400 |
| 287 | Sales | Area Sales Manager | 27 | 1504073313 | Buck, Edward | Current Stuff | N/A - still employed | 114400 |
| 288 | Sales | Area Sales Manager | 27 | 1504073368 | Bunbury, Jessica | Voluntarily Terminated | Another position | 114400 |
| 289 | Sales | Area Sales Manager | 27 | 1204032843 | Friedman, Gerry | Current Stuff | N/A - still employed | 115440 |
| 290 | Sales | Area Sales Manager | 27 | 1411071481 | Gonzales, Ricardo | Current Stuff | N/A - still employed | 115440 |
| 291 | Sales | Area Sales Manager | 27 | 1001084890 | Jeremy Prater | Current Stuff | N/A - still employed | 116480 |
| 292 | Sales | Area Sales Manager | 27 | 1111030503 | Villanueva, Noah | Current Stuff | N/A - still employed | 116480 |
| 293 | Sales | Area Sales Manager | 27 | 1209048771 | Martins, Joseph | Current Stuff | N/A - still employed | 116480 |
| 294 | Sales | Area Sales Manager | 27 | 1306059197 | Digitale, Alfred | Current Stuff | N/A - still employed | 116480 |
| 295 | Sales | Area Sales Manager | 27 | 1501072180 | Onque, Jasmine | Current Stuff | N/A - still employed | 118560 |
| 296 | Sales | Director of Sales | 1 | 1009021646 | Houlihan, Debra | Current Stuff | N/A - still employed | 124800 |
| 297 | Sales | Sales Manager | 3 | 1402065303 | Daneault, Lynn | Current Stuff | N/A - still employed | 112320 |
| 298 | Sales | Sales Manager | 3 | 1499902910 | Smith, John | Current Stuff | N/A - still employed | 116480 |
| 299 | Sales | Sales Manager | 3 | 1109029264 | Kampew, Donysha | Voluntarily Terminated | maternity leave - did not return | 125320 |
| 300 | Software Engineering | Software Engineer | 9 | 1102024057 | True, Edward | Voluntarily Terminated | medical issues | 94474 |
| 301 | Software Engineering | Software Engineer | 9 | 1107027358 | Andreola, Colby | Current Stuff | N/A - still employed | 99008 |
| 302 | Software Engineering | Software Engineer | 9 | 1201031324 | Szabo, Andrew | Current Stuff | N/A - still employed | 99840 |
| 303 | Software Engineering | Software Engineer | 9 | 1401064670 | Exantus, Susan | Terminated for Cause | attendance | 100880 |
| 304 | Software Engineering | Software Engineer | 9 | 1012023185 | Saada, Adell | Current Stuff | N/A - still employed | 102440 |
| 305 | Software Engineering | Software Engineer | 9 | 1112030979 | Patronick, Luke | Voluntarily Terminated | Another position | 108680 |
| 306 | Software Engineering | Software Engineer | 9 | 1303054625 | Martin, Sandra | Current Stuff | N/A - still employed | 115461 |
| 307 | Software Engineering | Software Engineer | 9 | 1101023577 | Carabbio, Judith | Current Stuff | N/A - still employed | 116480 |
| 308 | Software Engineering | Software Engineer | 9 | 1203032498 | Del Bosque, Keyla | Current Stuff | N/A - still employed | 118810 |
| 309 | Software Engineering | Software Engineering Manager | 1 | 1001644719 | Sweetwater, Alex | Current Stuff | N/A - still employed | 56160 |
# Создадим сетку графиков с формулировками причин увольнения и зарплатными ставками в разрезе должностей и департаментов
g=sns.catplot(
kind="strip",
x="position",
y="usd_per_year",
hue="Reason For Term",
data=dfg_TermReasonAndPayrate,
col="Employment Status",
row="department",
col_order=["Current Stuff", "Voluntarily Terminated", "Terminated for Cause"],
height=4.5,
aspect = 0.8,
palette="tab20",
margin_titles=True,
sharex=False,
sharey=True,
marker='8',
s=8,
jitter=True,
alpha=1)
g.fig.suptitle(
"Зависимость формулировки причины увольнения от годовой заработной платы, должностей и подразделений",
fontsize=16, x=0.50, y=1.01)
# К каждому фасеточному графику добавим горизонтальную линию медианы ставки заработной платы
# Для этого определим функцию построения горизонтальной линии и её подписи
def pay_rate_line(y, **kwargs):
# Построение линии медианы на основе принимаемых значений x, цвет, толщина, стиль
plt.axhline(y.median(), color='maroon', linewidth=0.75, linestyle=':')
# Создание подписи к линии медианы
plt.annotate(
text=f"медиана департамента:", # Аннотация линии медианы.
xy=(0,1), # Положение текста
xycoords="axes fraction",
horizontalalignment='left', # Выравнивание текста по горизонтали
verticalalignment='top', # Выравнивание текста по вертикали
rotation=None, # Поворот подписи
color='maroon', # Цвет надписи
alpha=0.75,
fontsize=10
)
# Определяем "разметку" для исполнения функции построения линии медианы каждого фасеточного графика
# Передаём функции размеры ставок заработной платы
g.map(pay_rate_line, 'usd_per_year')
# Определим размер подписи оси X
g.set_xlabels(fontsize=10)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=10)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=90,fontsize=9)
# Изменим интервал между индивидуальными графиками
g.fig.subplots_adjust(hspace=0.7)
plt.show()
ВЫВОД
sql_quiery = \
"""
-- Создадим запрос и датафрейм с формулировками причин увольнения и персоналиями начальников.
-- Нас интересуют только уволенные сотрудники.
-- Используем ДФ для построения сетки графиков
SELECT
"Manager Name",
Department,
COUNT("Employee Number"),
-- Так как формулировки увольнения могут совпадать при увольнении по собственной инициативе и при увольнении по
-- инициативе руководстрва, чтобы не перегружать таблицу и график еще одним признаком,
-- введём этот признак непосредственно в формулировку причины увольнения:
CASE WHEN "Employment Status" = 'Voluntarily Terminated'
THEN
CONCAT_WS(': ', 'SELF', "Reason For Term")
WHEN "Employment Status" = 'Terminated for Cause'
THEN
CONCAT_WS(': ', 'ADM', "Reason For Term")
END AS "Reason For Term"
FROM hr_dataset
WHERE
("Employment Status" = 'Voluntarily Terminated' OR
"Employment Status" = 'Terminated for Cause' )
GROUP BY
"Manager Name",
Department,
"Employment Status",
"Reason For Term"
ORDER BY
"Reason For Term",
Department,
"Manager Name"
;
"""
dfg_TermReasonAndManager = pd.read_sql(sql_quiery, conn)
dfg_TermReasonAndManager
| Manager Name | department | count | Reason For Term | |
|---|---|---|---|---|
| 0 | Janet King | Production | 1 | ADM: attendance |
| 1 | Kelley Spirea | Production | 1 | ADM: attendance |
| 2 | Ketsia Liebig | Production | 1 | ADM: attendance |
| 3 | Michael Albert | Production | 1 | ADM: attendance |
| 4 | John Smith | Sales | 1 | ADM: attendance |
| 5 | Alex Sweetwater | Software Engineering | 1 | ADM: attendance |
| 6 | Kelley Spirea | Production | 1 | ADM: gross misconduct |
| 7 | Simon Roup | IT/IS | 1 | ADM: hours |
| 8 | Simon Roup | IT/IS | 2 | ADM: no-call, no-show |
| 9 | Elijiah Gray | Production | 1 | ADM: no-call, no-show |
| 10 | Simon Roup | IT/IS | 1 | ADM: performance |
| 11 | Kissy Sullivan | Production | 2 | ADM: performance |
| 12 | Simon Roup | IT/IS | 1 | SELF: Another position |
| 13 | Amy Dunn | Production | 1 | SELF: Another position |
| 14 | Brannon Miller | Production | 1 | SELF: Another position |
| 15 | David Stanley | Production | 3 | SELF: Another position |
| 16 | Elijiah Gray | Production | 2 | SELF: Another position |
| 17 | Janet King | Production | 2 | SELF: Another position |
| 18 | Kelley Spirea | Production | 1 | SELF: Another position |
| 19 | Ketsia Liebig | Production | 1 | SELF: Another position |
| 20 | Kissy Sullivan | Production | 3 | SELF: Another position |
| 21 | Michael Albert | Production | 1 | SELF: Another position |
| 22 | Webster Butler | Production | 2 | SELF: Another position |
| 23 | John Smith | Sales | 1 | SELF: Another position |
| 24 | Alex Sweetwater | Software Engineering | 1 | SELF: Another position |
| 25 | Amy Dunn | Production | 1 | SELF: attendance |
| 26 | Brandon R. LeBlanc | Admin Offices | 1 | SELF: career change |
| 27 | Janet King | Admin Offices | 1 | SELF: career change |
| 28 | Simon Roup | IT/IS | 1 | SELF: career change |
| 29 | Brannon Miller | Production | 1 | SELF: career change |
| 30 | David Stanley | Production | 1 | SELF: career change |
| 31 | Elijiah Gray | Production | 2 | SELF: career change |
| 32 | Ketsia Liebig | Production | 1 | SELF: career change |
| 33 | Webster Butler | Production | 1 | SELF: career change |
| 34 | Jennifer Zamora | IT/IS | 1 | SELF: hours |
| 35 | Simon Roup | IT/IS | 1 | SELF: hours |
| 36 | Amy Dunn | Production | 2 | SELF: hours |
| 37 | Elijiah Gray | Production | 1 | SELF: hours |
| 38 | Kelley Spirea | Production | 1 | SELF: hours |
| 39 | Kissy Sullivan | Production | 1 | SELF: hours |
| 40 | Webster Butler | Production | 1 | SELF: hours |
| 41 | Kissy Sullivan | Production | 1 | SELF: maternity leave - did not return |
| 42 | Michael Albert | Production | 1 | SELF: maternity leave - did not return |
| 43 | Debra Houlihan | Sales | 1 | SELF: maternity leave - did not return |
| 44 | Peter Monroe | IT/IS | 1 | SELF: medical issues |
| 45 | Amy Dunn | Production | 1 | SELF: medical issues |
| 46 | Alex Sweetwater | Software Engineering | 1 | SELF: medical issues |
| 47 | Kissy Sullivan | Production | 1 | SELF: military |
| 48 | Webster Butler | Production | 3 | SELF: military |
| 49 | Amy Dunn | Production | 2 | SELF: more money |
| 50 | Brannon Miller | Production | 1 | SELF: more money |
| 51 | Kelley Spirea | Production | 2 | SELF: more money |
| 52 | Ketsia Liebig | Production | 1 | SELF: more money |
| 53 | Kissy Sullivan | Production | 1 | SELF: more money |
| 54 | Michael Albert | Production | 3 | SELF: more money |
| 55 | Webster Butler | Production | 1 | SELF: more money |
| 56 | Simon Roup | IT/IS | 1 | SELF: performance |
| 57 | Amy Dunn | Production | 1 | SELF: relocation out of area |
| 58 | Ketsia Liebig | Production | 1 | SELF: relocation out of area |
| 59 | Michael Albert | Production | 2 | SELF: relocation out of area |
| 60 | John Smith | Sales | 1 | SELF: relocation out of area |
| 61 | Brannon Miller | Production | 1 | SELF: retiring |
| 62 | Elijiah Gray | Production | 1 | SELF: retiring |
| 63 | Janet King | Production | 1 | SELF: retiring |
| 64 | Webster Butler | Production | 1 | SELF: retiring |
| 65 | Amy Dunn | Production | 1 | SELF: return to school |
| 66 | David Stanley | Production | 1 | SELF: return to school |
| 67 | Kissy Sullivan | Production | 1 | SELF: return to school |
| 68 | Webster Butler | Production | 2 | SELF: return to school |
| 69 | Amy Dunn | Production | 4 | SELF: unhappy |
| 70 | Brannon Miller | Production | 2 | SELF: unhappy |
| 71 | David Stanley | Production | 1 | SELF: unhappy |
| 72 | Elijiah Gray | Production | 1 | SELF: unhappy |
| 73 | Janet King | Production | 1 | SELF: unhappy |
| 74 | Kissy Sullivan | Production | 2 | SELF: unhappy |
| 75 | Michael Albert | Production | 1 | SELF: unhappy |
| 76 | Webster Butler | Production | 2 | SELF: unhappy |
# Создадим сетку графиков зависимости причин увольнения от персоналии начальника
# в разрезе должностей и департаментов
g=sns.catplot(
kind="bar",
x="Manager Name",
y="count",
hue="Reason For Term",
data=dfg_TermReasonAndManager,
col="department",
col_wrap=1, # графики придётся сделать пошире, чтобы читались данные по Production
height=4.5,
aspect = 5,
palette="tab20",
margin_titles=True,
sharex=False,
sharey=False,
alpha=1.0,
dodge=True,
legend=False, # Не будем выводить легенду внутри сетки у нас будет отдельная
legend_out=True, # Определяем вывод отдельной легенды
)
g.fig.suptitle(
"Зависимость формулировок причин увольнения от персоналии начальника в разрезе подразделений",
fontsize=20, x=0.28, y=1.075)
# Определим размер подписи оси X
g.set_xlabels(fontsize=14)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=14)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=0,fontsize=14)
# Изменим интервал между индивидуальными графиками
g.fig.subplots_adjust(hspace=0.4)
# Добавляем общую легенду, определяем её размеры и рамещение (x,y,width,height), количество полей и название
g.add_legend(bbox_to_anchor=(0, 0.76, 0.6, 0.3), loc='upper center', ncol=6, title='Reasons for Termination', fontsize=14)
plt.show()
ВЫВОД
sql_quiery = \
"""
-- Создадим запрос и датафрейм с формулировками причин увольнения и сроками работы в компании.
-- Нас интересуют только уволенные сотрудники.
-- Используем ДФ для построения графиков
SELECT
-- Так как формулировки увольнения могут совпадать при увольнении по собственной инициативе и при увольнении по
-- инициативе руководстрва, чтобы не перегружать таблицу и график еще одним признаком,
-- введём этот признак непосредственно в формулировку причины увольнения:
CASE WHEN "Employment Status" = 'Voluntarily Terminated'
THEN
CONCAT_WS(': ', 'SELF', "Reason For Term")
WHEN "Employment Status" = 'Terminated for Cause'
THEN
CONCAT_WS(': ', 'ADM', "Reason For Term")
END AS "Reason For Term",
"Days Employed"
FROM hr_dataset
WHERE
("Employment Status" = 'Voluntarily Terminated' OR
"Employment Status" = 'Terminated for Cause' )
ORDER BY
"Reason For Term",
"Days Employed" DESC
"""
dfg_TermReasonAndDays = pd.read_sql(sql_quiery, conn)
dfg_TermReasonAndDays
| Reason For Term | Days Employed | |
|---|---|---|
| 0 | ADM: attendance | 1954 |
| 1 | ADM: attendance | 1797 |
| 2 | ADM: attendance | 908 |
| 3 | ADM: attendance | 765 |
| 4 | ADM: attendance | 425 |
| 5 | ADM: attendance | 164 |
| 6 | ADM: gross misconduct | 1596 |
| 7 | ADM: hours | 732 |
| 8 | ADM: no-call, no-show | 27 |
| 9 | ADM: no-call, no-show | 8 |
| 10 | ADM: no-call, no-show | 6 |
| 11 | ADM: performance | 762 |
| 12 | ADM: performance | 440 |
| 13 | ADM: performance | 432 |
| 14 | SELF: Another position | 2583 |
| 15 | SELF: Another position | 2032 |
| 16 | SELF: Another position | 1468 |
| 17 | SELF: Another position | 1435 |
| 18 | SELF: Another position | 1400 |
| 19 | SELF: Another position | 1318 |
| 20 | SELF: Another position | 1083 |
| 21 | SELF: Another position | 1070 |
| 22 | SELF: Another position | 497 |
| 23 | SELF: Another position | 448 |
| 24 | SELF: Another position | 439 |
| 25 | SELF: Another position | 419 |
| 26 | SELF: Another position | 309 |
| 27 | SELF: Another position | 218 |
| 28 | SELF: Another position | 194 |
| 29 | SELF: Another position | 124 |
| 30 | SELF: Another position | 98 |
| 31 | SELF: Another position | 45 |
| 32 | SELF: Another position | 19 |
| 33 | SELF: Another position | 2 |
| 34 | SELF: attendance | 1842 |
| 35 | SELF: career change | 1636 |
| 36 | SELF: career change | 1180 |
| 37 | SELF: career change | 1114 |
| 38 | SELF: career change | 730 |
| 39 | SELF: career change | 718 |
| 40 | SELF: career change | 444 |
| 41 | SELF: career change | 399 |
| 42 | SELF: career change | 392 |
| 43 | SELF: career change | 280 |
| 44 | SELF: hours | 1395 |
| 45 | SELF: hours | 770 |
| 46 | SELF: hours | 770 |
| 47 | SELF: hours | 447 |
| 48 | SELF: hours | 299 |
| 49 | SELF: hours | 125 |
| 50 | SELF: hours | 8 |
| 51 | SELF: hours | 2 |
| 52 | SELF: maternity leave - did not return | 1990 |
| 53 | SELF: maternity leave - did not return | 1271 |
| 54 | SELF: maternity leave - did not return | 899 |
| 55 | SELF: medical issues | 1623 |
| 56 | SELF: medical issues | 421 |
| 57 | SELF: medical issues | 127 |
| 58 | SELF: military | 1179 |
| 59 | SELF: military | 1162 |
| 60 | SELF: military | 922 |
| 61 | SELF: military | 794 |
| 62 | SELF: more money | 1675 |
| 63 | SELF: more money | 1635 |
| 64 | SELF: more money | 1347 |
| 65 | SELF: more money | 1150 |
| 66 | SELF: more money | 1055 |
| 67 | SELF: more money | 514 |
| 68 | SELF: more money | 378 |
| 69 | SELF: more money | 264 |
| 70 | SELF: more money | 194 |
| 71 | SELF: more money | 105 |
| 72 | SELF: more money | 69 |
| 73 | SELF: performance | 517 |
| 74 | SELF: relocation out of area | 1602 |
| 75 | SELF: relocation out of area | 1334 |
| 76 | SELF: relocation out of area | 1265 |
| 77 | SELF: relocation out of area | 571 |
| 78 | SELF: relocation out of area | 236 |
| 79 | SELF: retiring | 1705 |
| 80 | SELF: retiring | 1436 |
| 81 | SELF: retiring | 1140 |
| 82 | SELF: retiring | 311 |
| 83 | SELF: return to school | 921 |
| 84 | SELF: return to school | 693 |
| 85 | SELF: return to school | 62 |
| 86 | SELF: return to school | 57 |
| 87 | SELF: return to school | 26 |
| 88 | SELF: unhappy | 1908 |
| 89 | SELF: unhappy | 1575 |
| 90 | SELF: unhappy | 1484 |
| 91 | SELF: unhappy | 1116 |
| 92 | SELF: unhappy | 777 |
| 93 | SELF: unhappy | 770 |
| 94 | SELF: unhappy | 602 |
| 95 | SELF: unhappy | 581 |
| 96 | SELF: unhappy | 539 |
| 97 | SELF: unhappy | 462 |
| 98 | SELF: unhappy | 267 |
| 99 | SELF: unhappy | 170 |
| 100 | SELF: unhappy | 83 |
| 101 | SELF: unhappy | 72 |
# Создадим график зависимости форомулировок причин увольнения от срока работы в компании
g=sns.catplot(
kind="box",
x="Reason For Term",
y="Days Employed",
hue="Reason For Term",
data=dfg_TermReasonAndDays,
height=10,
aspect=2,
palette="tab20",
margin_titles=True,
sharex=True,
sharey=True,
dodge=False
)
# Определим заголовок графика
g.fig.suptitle(
"Зависимость формулировок причин увольнения от срока работы в компании",
fontsize=20, x=0.28, y=1.2)
# catplot строит сетку графиков. Ниже мы адресуемся к единственному построенному графику, имеющиму в сетке коориднаты [0,0]
# Установим шаг шкалы Y в 90 дней.
g.axes[0][0].yaxis.set_major_locator(mpl.ticker.MultipleLocator(90))
# Установим диапазон шкалы Y от 0 до 2700 дней.
g.axes[0][0].set_ylim(0, 2700)
# Определим размер подписи оси X
g.set_xlabels(fontsize=14)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=14)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=80,fontsize=14)
# Для удобства чтения создадим дополнительную шкалу Y справа (см. matplotlib.axes.Axes.secondary_yaxis),
# определим размерность шкалы в годах, создадим соответствующую подпись шкалы, определим её размер.
secax = g.axes[0][0].secondary_yaxis("right",functions=(lambda x:x/360, lambda x:x*360))
secax.set_ylabel('Years Employed',fontsize=14)
# Добавляем общую легенду, определяем её размеры и рамещение (x,y,width,height), количество полей и название
g.add_legend(bbox_to_anchor=(0, 0.85, 0.575,0.3), loc='upper center', ncol=6, title='Reasons for Termination', fontsize=14)
plt.show()
ВЫВОД
sql_quiery = \
"""
-- Создадим запрос и датафрейм с формулировками причин увольнения и оценками производительности.
-- Нас интересуют только уволенные сотрудники.
-- Используем ДФ для построения сетки графиков
SELECT
-- Так как формулировки увольнения могут совпадать при увольнении по собственной инициативе и при увольнении по
-- инициативе руководстрва, чтобы не перегружать таблицу и график еще одним признаком,
-- введём этот признак непосредственно в формулировку причины увольнения:
CASE WHEN "Employment Status" = 'Voluntarily Terminated'
THEN
CONCAT_WS(': ', 'SELF', "Reason For Term")
WHEN "Employment Status" = 'Terminated for Cause'
THEN
CONCAT_WS(': ', 'ADM', "Reason For Term")
END AS "Reason For Term",
"Performance Score",
COUNT("Employee Number")
FROM hr_dataset
WHERE
("Employment Status" = 'Voluntarily Terminated' OR
"Employment Status" = 'Terminated for Cause' )
GROUP BY
"Performance Score",
"Employment Status",
"Reason For Term"
ORDER BY
"Reason For Term",
"Performance Score"
"""
dfg_TermReasonAndScore = pd.read_sql(sql_quiery, conn)
dfg_TermReasonAndScore
| Reason For Term | Performance Score | count | |
|---|---|---|---|
| 0 | ADM: attendance | 90-day meets | 1 |
| 1 | ADM: attendance | Fully Meets | 2 |
| 2 | ADM: attendance | Needs Improvement | 3 |
| 3 | ADM: gross misconduct | Exceeds | 1 |
| 4 | ADM: hours | Fully Meets | 1 |
| 5 | ADM: no-call, no-show | Fully Meets | 1 |
| 6 | ADM: no-call, no-show | N/A- too early to review | 2 |
| 7 | ADM: performance | Fully Meets | 1 |
| 8 | ADM: performance | Needs Improvement | 1 |
| 9 | ADM: performance | PIP | 1 |
| 10 | SELF: Another position | 90-day meets | 2 |
| 11 | SELF: Another position | Exceeds | 2 |
| 12 | SELF: Another position | Fully Meets | 12 |
| 13 | SELF: Another position | N/A- too early to review | 3 |
| 14 | SELF: Another position | PIP | 1 |
| 15 | SELF: attendance | Fully Meets | 1 |
| 16 | SELF: career change | 90-day meets | 1 |
| 17 | SELF: career change | Fully Meets | 5 |
| 18 | SELF: career change | Needs Improvement | 2 |
| 19 | SELF: career change | PIP | 1 |
| 20 | SELF: hours | 90-day meets | 3 |
| 21 | SELF: hours | Fully Meets | 3 |
| 22 | SELF: hours | N/A- too early to review | 2 |
| 23 | SELF: maternity leave - did not return | Exceeds | 1 |
| 24 | SELF: maternity leave - did not return | Fully Meets | 2 |
| 25 | SELF: medical issues | Fully Meets | 3 |
| 26 | SELF: military | Fully Meets | 3 |
| 27 | SELF: military | Needs Improvement | 1 |
| 28 | SELF: more money | 90-day meets | 1 |
| 29 | SELF: more money | Exceeds | 2 |
| 30 | SELF: more money | Fully Meets | 7 |
| 31 | SELF: more money | N/A- too early to review | 1 |
| 32 | SELF: performance | Fully Meets | 1 |
| 33 | SELF: relocation out of area | 90-day meets | 1 |
| 34 | SELF: relocation out of area | Fully Meets | 4 |
| 35 | SELF: retiring | Exceeds | 1 |
| 36 | SELF: retiring | Fully Meets | 3 |
| 37 | SELF: return to school | Fully Meets | 2 |
| 38 | SELF: return to school | N/A- too early to review | 3 |
| 39 | SELF: unhappy | 90-day meets | 4 |
| 40 | SELF: unhappy | Exceeds | 1 |
| 41 | SELF: unhappy | Fully Meets | 5 |
| 42 | SELF: unhappy | N/A- too early to review | 2 |
| 43 | SELF: unhappy | Needs Improvement | 1 |
| 44 | SELF: unhappy | PIP | 1 |
# Создадим сетку графиков зависимости формулировок причин увольнения от оценки произаводительности
# сотрудников
g=sns.catplot(
kind="bar",
x="Reason For Term",
y="count",
hue="Reason For Term",
data=dfg_TermReasonAndScore,
col="Performance Score",
ci=None,
col_wrap=1,
height=3.5,
aspect=6,
palette="tab20",
margin_titles=True,
sharex=True,
sharey=True,
alpha=1.0,
dodge=False,
legend=False, # Не будем выводить легенду внутри сетки у нас будет отдельная
legend_out=True, # Определяем вывод отдельной легенды
)
g.fig.suptitle(
"Зависимость формулировок причин увольнения и оценки производительности",
fontsize=20, x=0.28, y=1.055)
# Определим размер подписи оси X
g.set_xlabels(fontsize=14)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=14)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=80,fontsize=14)
# Изменим интервал между индивидуальными графиками
g.fig.subplots_adjust(hspace=0.4)
# Добавляем общую легенду, определяем её размеры и рамещение (x,y,width,height), количество полей и название
g.add_legend(bbox_to_anchor=(0, 0.75, 0.575, 0.3), loc='upper center',
ncol=6, title='Reasons for Termination', fontsize=14)
# К каждому фасеточному графику добавим подпись с общим числом уволенных сотрудников
# Для этого определим функцию
def empl_count_annotate(y, **kwargs):
# Создание подписи
plt.annotate(
text=f"Общее число увольнений: {sum(y)}", # Текст аннотации.
xy=(0.5, 0.75), # Положение надписи
xycoords="axes fraction",
horizontalalignment='center', # Выравнивание текста по горизонтали
verticalalignment='center', # Выравнивание текста по вертикали
rotation=None, # Поворот подписи
color='maroon', # Цвет надписи
alpha=0.75,
fontsize=18,
# Добавим бокс вокруг надписи: округлый, отступ 0.3, белый фон, граница черная, толщина линии 2
bbox=dict(boxstyle="round, pad=0.3", fc="white", ec="black", lw=2)
)
# Определяем "разметку" для исполнения функции построения подписи каждого фасеточного графика
# Передаём функции размеры ставок заработной платы
g.map(empl_count_annotate, 'count')
plt.show()
ВЫВОД
Наибольшие из увольнений по инициативе работника:
Уволились самостоятельно:
Уволились самостоятельно:
Уволились самостоятельно:
Уволились самостоятельно:
Уволились самостоятельно:
sql_quiery = \
"""
-- Создадим запрос и датафрейм с формулировками причин увольнения и источниками найма.
-- Нас интересуют только уволенные сотрудники.
-- Используем ДФ для построения сетки графиков
SELECT
-- Department,
-- Так как формулировки увольнения могут совпадать при увольнении по собственной инициативе и при увольнении по
-- инициативе руководстрва, чтобы не перегружать таблицу и график еще одним признаком,
-- введём этот признак непосредственно в формулировку причины увольнения:
CASE WHEN "Employment Status" = 'Voluntarily Terminated'
THEN
CONCAT_WS(': ', 'SELF', "Reason For Term")
WHEN "Employment Status" = 'Terminated for Cause'
THEN
CONCAT_WS(': ', 'ADM', "Reason For Term")
END AS "Reason For Term",
"Employee Source",
COUNT("Employee Number")
FROM hr_dataset
WHERE
("Employment Status" = 'Voluntarily Terminated' OR
"Employment Status" = 'Terminated for Cause' )
GROUP BY
"Employee Source",
"Employment Status",
"Reason For Term"
ORDER BY
"Reason For Term",
"Employee Source"
"""
dfg_TermReasonAndSource = pd.read_sql(sql_quiery, conn)
dfg_TermReasonAndSource
| Reason For Term | Employee Source | count | |
|---|---|---|---|
| 0 | ADM: attendance | Billboard | 1 |
| 1 | ADM: attendance | Employee Referral | 1 |
| 2 | ADM: attendance | Glassdoor | 1 |
| 3 | ADM: attendance | Monster.com | 1 |
| 4 | ADM: attendance | Social Networks - Facebook Twitter etc | 1 |
| 5 | ADM: attendance | Word of Mouth | 1 |
| 6 | ADM: gross misconduct | Social Networks - Facebook Twitter etc | 1 |
| 7 | ADM: hours | Vendor Referral | 1 |
| 8 | ADM: no-call, no-show | Employee Referral | 1 |
| 9 | ADM: no-call, no-show | Glassdoor | 1 |
| 10 | ADM: no-call, no-show | Word of Mouth | 1 |
| 11 | ADM: performance | MBTA ads | 1 |
| 12 | ADM: performance | Professional Society | 1 |
| 13 | ADM: performance | Search Engine - Google Bing Yahoo | 1 |
| 14 | SELF: Another position | Diversity Job Fair | 6 |
| 15 | SELF: Another position | Glassdoor | 2 |
| 16 | SELF: Another position | Internet Search | 1 |
| 17 | SELF: Another position | Newspager/Magazine | 1 |
| 18 | SELF: Another position | On-line Web application | 1 |
| 19 | SELF: Another position | Search Engine - Google Bing Yahoo | 4 |
| 20 | SELF: Another position | Social Networks - Facebook Twitter etc | 4 |
| 21 | SELF: Another position | Vendor Referral | 1 |
| 22 | SELF: attendance | Glassdoor | 1 |
| 23 | SELF: career change | Company Intranet - Partner | 1 |
| 24 | SELF: career change | Diversity Job Fair | 3 |
| 25 | SELF: career change | Monster.com | 1 |
| 26 | SELF: career change | Other | 1 |
| 27 | SELF: career change | Search Engine - Google Bing Yahoo | 2 |
| 28 | SELF: career change | Word of Mouth | 1 |
| 29 | SELF: hours | Diversity Job Fair | 2 |
| 30 | SELF: hours | Monster.com | 1 |
| 31 | SELF: hours | Newspager/Magazine | 2 |
| 32 | SELF: hours | Pay Per Click | 1 |
| 33 | SELF: hours | Social Networks - Facebook Twitter etc | 1 |
| 34 | SELF: hours | Vendor Referral | 1 |
| 35 | SELF: maternity leave - did not return | Information Session | 1 |
| 36 | SELF: maternity leave - did not return | Monster.com | 1 |
| 37 | SELF: maternity leave - did not return | Social Networks - Facebook Twitter etc | 1 |
| 38 | SELF: medical issues | Diversity Job Fair | 1 |
| 39 | SELF: medical issues | MBTA ads | 1 |
| 40 | SELF: medical issues | Monster.com | 1 |
| 41 | SELF: military | Diversity Job Fair | 1 |
| 42 | SELF: military | Pay Per Click - Google | 1 |
| 43 | SELF: military | Search Engine - Google Bing Yahoo | 2 |
| 44 | SELF: more money | Employee Referral | 1 |
| 45 | SELF: more money | Monster.com | 2 |
| 46 | SELF: more money | On-campus Recruiting | 1 |
| 47 | SELF: more money | Pay Per Click - Google | 2 |
| 48 | SELF: more money | Search Engine - Google Bing Yahoo | 4 |
| 49 | SELF: more money | Word of Mouth | 1 |
| 50 | SELF: performance | Employee Referral | 1 |
| 51 | SELF: relocation out of area | Diversity Job Fair | 1 |
| 52 | SELF: relocation out of area | Glassdoor | 1 |
| 53 | SELF: relocation out of area | Newspager/Magazine | 1 |
| 54 | SELF: relocation out of area | Other | 1 |
| 55 | SELF: relocation out of area | Word of Mouth | 1 |
| 56 | SELF: retiring | Billboard | 1 |
| 57 | SELF: retiring | Monster.com | 1 |
| 58 | SELF: retiring | Professional Society | 2 |
| 59 | SELF: return to school | Billboard | 1 |
| 60 | SELF: return to school | Diversity Job Fair | 1 |
| 61 | SELF: return to school | Monster.com | 1 |
| 62 | SELF: return to school | Search Engine - Google Bing Yahoo | 1 |
| 63 | SELF: return to school | Word of Mouth | 1 |
| 64 | SELF: unhappy | Billboard | 2 |
| 65 | SELF: unhappy | Diversity Job Fair | 1 |
| 66 | SELF: unhappy | Internet Search | 1 |
| 67 | SELF: unhappy | MBTA ads | 2 |
| 68 | SELF: unhappy | Monster.com | 2 |
| 69 | SELF: unhappy | Newspager/Magazine | 1 |
| 70 | SELF: unhappy | Other | 1 |
| 71 | SELF: unhappy | Search Engine - Google Bing Yahoo | 1 |
| 72 | SELF: unhappy | Vendor Referral | 1 |
| 73 | SELF: unhappy | Website Banner Ads | 1 |
| 74 | SELF: unhappy | Word of Mouth | 1 |
# Создадим сетку графиков зависимости формулировок причин увольнения в зависимость от источнка найма сотрудников
g=sns.catplot(
kind="bar",
x="Reason For Term",
y="count",
hue="Reason For Term",
data=dfg_TermReasonAndSource,
col="Employee Source",
ci=None,
col_wrap=1,
height=3.5,
aspect=6,
palette="tab20",
margin_titles=True,
sharex=True,
sharey=True,
alpha=1.0,
dodge=False,
legend=False, # Не будем выводить легенду внутри сетки у нас будет отдельная
legend_out=True, # Определяем вывод отдельной легенды
)
g.fig.suptitle(
"Зависимость формулировок причин увольнения и источников найма",
fontsize=20, x=0.28, y=1.025)
# Определим размер подписи оси X
g.set_xlabels(fontsize=14)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=14)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=80,fontsize=14)
# Изменим интервал между индивидуальными графиками
g.fig.subplots_adjust(hspace=0.4)
# Добавляем общую легенду, определяем её размеры и рамещение (x,y,width,height), количество полей и название
g.add_legend(bbox_to_anchor=(0, 0.72, 0.575,0.3), loc='upper center', ncol=6, title='Reasons for Termination', fontsize=14)
plt.show()
ВЫВОД
# Создадим DF для исследования зависимости причин увольнения по дате найма
# Здесь нет необходимости учитывать годы, в которых никто не был принят на работу, в каждом году кто-то увольнялся.
# Поэтому не будем привязывать данные к сплошной временной шкале.
sql_quiery = \
"""
WITH
employee_selection AS
(SELECT
"Employee Number",
DATE_TRUNC('year', "Date of Hire") AS year, -- приведем даты найма к году
CASE WHEN "Employment Status" = 'Voluntarily Terminated'
THEN CONCAT_WS(': ', 'SELF', "Reason For Term")
WHEN "Employment Status" = 'Terminated for Cause'
THEN CONCAT_WS(': ', 'ADM', "Reason For Term")
END AS termination_reason
FROM
hr_dataset
WHERE
("Employment Status" = 'Voluntarily Terminated' OR
"Employment Status" = 'Terminated for Cause' )
),
year_of_hire AS
(SELECT
year,
COUNT("Employee Number") AS employee_count,
termination_reason
FROM
employee_selection
GROUP BY
year,
termination_reason
),
totals AS
(SELECT
year,
COUNT("Employee Number") AS employee_count,
'[TOTAL]' AS termination_reason
FROM
employee_selection
GROUP BY
year
)
(SELECT *
FROM
year_of_hire)
UNION ALL
(SELECT *
FROM
totals)
ORDER BY
termination_reason,
year
;
"""
dfg_termreason_over_date_hire = pd.read_sql(sql_quiery, conn)
#dfg_termreason_over_date_hire
# Построим сетку графиков для зависимости причин увольнения от года приёма на работу сотрудника
# Данные для каждого источника найма разместим в отдельном графике сетки
g = sns.relplot(data=dfg_termreason_over_date_hire,
x="year",
y="employee_count",
col="termination_reason",
hue="termination_reason",
kind="line",
palette="brg",
linewidth=4,
zorder=7,
col_wrap=4,
legend=False,
marker='o',
facet_kws=dict(sharex=False, sharey=False)
)
# Для каждого графика в сетке определим дополнительные параметры
for termination_reason, ax in g.axes_dict.items(): # перебираем параметры индивидуальных графиков
# Добавим заголовок каждого графика в виде аннотации, расположенной в пооле графика
ax.text(.05, .95, termination_reason, transform=ax.transAxes, fontweight="bold")
# Построим "теневые" графики для других источников найма в поле каждого графика, кроме [TOTAL]
sns.lineplot(data=dfg_termreason_over_date_hire[dfg_termreason_over_date_hire['termination_reason'] != '[TOTAL]'],
x="year",
y="employee_count",
units="termination_reason",
estimator=None,
color=".7",
linewidth=1,
ax=ax
)
# Уберём индивидуальные заголовки графиков
g.set_titles("")
# Установим "плотное" расположение графиков
g.tight_layout()
# Определим общий заголовок графика
g.fig.suptitle("Зависимость причин увольнения от года приёма на работу сотрудника",
fontsize=16, x=0.50, y=1.015)
plt.show()
ВЫВОД
# Создадим DF для исследования зависимости причин увольнения по дате увольнения
# Здесь нет необходимости учитывать годы, в которых никто не был принят на работу, в каждом году кто-то увольнялся.
# Поэтому не будем привязывать данные к сплошной временной шкале.
sql_quiery = \
"""
WITH
employee_selection AS
(SELECT
"Employee Number",
DATE_TRUNC('year', "Date of Termination") AS year, -- приведем даты найма к году
CASE WHEN "Employment Status" = 'Voluntarily Terminated'
THEN CONCAT_WS(': ', 'SELF', "Reason For Term")
WHEN "Employment Status" = 'Terminated for Cause'
THEN CONCAT_WS(': ', 'ADM', "Reason For Term")
END AS termination_reason
FROM
hr_dataset
WHERE
("Employment Status" = 'Voluntarily Terminated' OR
"Employment Status" = 'Terminated for Cause' )
),
year_of_hire AS
(SELECT
year,
COUNT("Employee Number") AS employee_count,
termination_reason
FROM
employee_selection
GROUP BY
year,
termination_reason
),
totals AS
(SELECT
year,
COUNT("Employee Number") AS employee_count,
'[TOTAL]' AS termination_reason
FROM
employee_selection
GROUP BY
year
)
(SELECT *
FROM
year_of_hire)
UNION ALL
(SELECT *
FROM
totals)
ORDER BY
termination_reason,
year
;
"""
dfg_termreason_over_date_term = pd.read_sql(sql_quiery, conn)
#dfg_termreason_over_date_term
# Построим сетку графиков для зависимости источника найма от года увольнения сотрудника
# Данные для каждого источника найма разместим в отдельном графике сетки
g = sns.relplot(data=dfg_termreason_over_date_term,
x="year",
y="employee_count",
col="termination_reason",
hue="termination_reason",
kind="line",
palette="gist_rainbow",
linewidth=3,
zorder=7,
col_wrap=4,
legend=False,
marker='o',
facet_kws=dict(sharex=False, sharey=False)
)
# Для каждого графика в сетке определим дополнительные параметры
for termination_reason, ax in g.axes_dict.items(): # перебираем параметры индивидуальных графиков
# Добавим заголовок каждого графика в виде аннотации, расположенной в пооле графика
ax.text(.05, .95, termination_reason, transform=ax.transAxes, fontweight="bold")
# Построим "теневые" графики для других источников найма в поле каждого графика, кроме [TOTAL]
sns.lineplot(data=dfg_termreason_over_date_term[dfg_termreason_over_date_term['termination_reason'] != '[TOTAL]'],
x="year",
y="employee_count",
units="termination_reason",
estimator=None,
color=".7",
linewidth=1,
ax=ax
)
# Уберём индивидуальные заголовки графиков
g.set_titles("")
# Установим "плотное" расположение графиков
g.tight_layout()
g.fig.suptitle("Зависимость причин увольнения от года увольнения сотрудника",
fontsize=16, x=0.50, y=1.015)
plt.show()
ВЫВОД
За исключением менеджеров, так как для них не предусмотрены KPI
# Создадим DF для исследования зависимости причин увольнения от KPI
sql_quiery = \
"""
WITH
employee_selection AS
(SELECT
"Employee Name",
CASE WHEN "Employment Status" = 'Voluntarily Terminated'
THEN CONCAT_WS(': ', 'SELF', "Reason For Term")
WHEN "Employment Status" = 'Terminated for Cause'
THEN CONCAT_WS(': ', 'ADM', "Reason For Term")
END AS termination_reason
FROM
hr_dataset
WHERE
("Employment Status" = 'Voluntarily Terminated' OR
"Employment Status" = 'Terminated for Cause' )
),
kpi_s AS
(SELECT
"Employee Name",
'Abutments/Hour Wk 1' AS "KPI_Name",
"Abutments/Hour Wk 1" AS "KPI_Value"
FROM
production_staff
UNION ALL
SELECT
"Employee Name",
'Abutments/Hour Wk 2' AS "KPI_Name",
"Abutments/Hour Wk 2" AS "KPI_Value"
FROM
production_staff
UNION ALL
SELECT
"Employee Name",
'Daily Error Rate' AS "KPI_Name",
"Daily Error Rate" AS "KPI_Value"
FROM
production_staff
UNION ALL
SELECT
"Employee Name",
'90-day Complaints' AS "KPI_Name",
"90-day Complaints" AS "KPI_Value"
FROM
production_staff
)
SELECT
termination_reason,
"KPI_Name",
AVG("KPI_Value") AS mean_KPI,
COUNT("Employee Name")
FROM
employee_selection
LEFT JOIN
kpi_s
USING("Employee Name")
GROUP BY
termination_reason,
"KPI_Name"
ORDER BY
termination_reason,
"KPI_Name"
;
"""
dfg_termreason_over_KPI = pd.read_sql(sql_quiery, conn).dropna()
#dfg_termreason_over_KPI
# Создадим сетку графиков зависимости формулировок причин увольнения от KPI
kpi_list=["Abutments/Hour Wk 1", "Abutments/Hour Wk 2", "Daily Error Rate", "90-day Complaints"]
g=sns.catplot(
kind="bar",
x="termination_reason",
y="mean_kpi",
hue="termination_reason",
data=dfg_termreason_over_KPI,
col="KPI_Name",
col_order=kpi_list,
ci=None,
col_wrap=1,
height=3.5,
aspect=6,
palette="tab20",
margin_titles=True,
sharex=True,
sharey=True,
alpha=1.0,
dodge=False,
legend=False, # Не будем выводить легенду внутри сетки у нас будет отдельная
legend_out=True, # Определяем вывод отдельной легенды
)
# Определим заголовок графиков
g.fig.suptitle(
"Зависимость формулировок причин увольнения от средних значений KPI",
fontsize=20, x=0.28, y=1.125)
# Определим размер подписи оси X
g.set_xlabels("Termination Reasons",fontsize=14)
# Определим размер подписи оси Y
g.set_ylabels("Average KPI value",fontsize=14)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=80,fontsize=14)
# Изменим интервал между индивидуальными графиками
g.fig.subplots_adjust(hspace=0.25)
# Добавляем общую легенду, определяем её размеры и рамещение (x,y,width,height), количество полей и название
g.add_legend(bbox_to_anchor=(0, 0.8, 0.55, 0.3),
loc='upper center', ncol=6, title='Reasons for Termination', fontsize=14)
plt.show()
ВЫВОД
# Создадим DF для исследования зависимости причин увольнения от возраста сотрудников
sql_quiery = \
"""
WITH
employee_selection AS
(SELECT
"Employee Number",
-- приведём разницу между датой увольнения и датой рождения к годам
CAST(EXTRACT(year FROM AGE("Date of Termination", dob)) AS INTEGER) AS age_of_term,
sex,
racedesc,
maritaldesc,
CASE WHEN "Employment Status" = 'Voluntarily Terminated'
THEN CONCAT_WS(': ', 'SELF', "Reason For Term")
WHEN "Employment Status" = 'Terminated for Cause'
THEN CONCAT_WS(': ', 'ADM', "Reason For Term")
END AS termination_reason
FROM
hr_dataset
WHERE
("Employment Status" = 'Voluntarily Terminated' OR
"Employment Status" = 'Terminated for Cause' )
),
totals AS
(SELECT
"Employee Number",
age_of_term,
sex,
racedesc,
maritaldesc,
'[TOTAL]' AS termination_reason
FROM
employee_selection
)
(SELECT *
FROM
employee_selection)
UNION ALL
(SELECT *
FROM
totals)
ORDER BY
termination_reason,
age_of_term,
sex,
racedesc,
maritaldesc
;
"""
dfg_termreason_over_social = pd.read_sql(sql_quiery, conn)
#dfg_termreason_over_social
# Построим сетку графиков для зависимости причин увольнения от возраста сотрудников при увольнении
# Данные для каждой причины разместим в отдельном графике сетки
g=sns.displot(data=dfg_termreason_over_social,
x="age_of_term",
hue="termination_reason",
col="termination_reason",
col_wrap=4,
binwidth=5,
binrange=(20, 65),
palette="tab20",
legend=False,
facet_kws=dict(sharex=True,
sharey=False,
xlim=(15,70))
)
# Для каждого графика в сетке определим дополнительные параметры
for termination_reason, ax in g.axes_dict.items(): # перебираем параметры индивидуальных графиков
# Добавим заголовок каждого графика в виде аннотации, расположенной в пооле графика
ax.text(.95, .95, termination_reason, horizontalalignment='right', transform=ax.transAxes, fontweight="bold")
# Уберём индивидуальные заголовки графиков
g.set_titles("")
# Установим "плотное" расположение графиков
g.tight_layout()
g.fig.suptitle("Зависимость причин увольнения от возраста сотрудника на момент увольнения",
fontsize=16, x=0.50, y=1.015)
plt.show()
ВЫВОД
# Построим сетку графиков для зависимости причин увольнения от пола сотрудника
# Данные для каждого источника найма разместим в отдельном графике сетки
data=dfg_termreason_over_social
g = sns.catplot(x="sex",
col="termination_reason",
col_wrap=4,
data=data,
kind="count",
palette="Pastel1",
legend=False,
sharey=False
)
# Для каждого графика в сетке определим дополнительные параметры
for termination_reason, ax in g.axes_dict.items(): # перебираем параметры индивидуальных графиков
# Добавим заголовок каждого графика в виде аннотации, расположенной в пооле графика
ax.text(.95, .95, termination_reason, horizontalalignment='right', transform=ax.transAxes, fontweight="bold")
# Уберём индивидуальные заголовки графиков
g.set_titles("")
# Установим "плотное" расположение графиков
g.tight_layout()
g.fig.suptitle("Зависимость причин увольнения от пола сотрудника",
fontsize=16, x=0.50, y=1.015)
plt.show()
ВЫВОД
# Построим сетку графиков для зависимости причин увольнения от расово-этнической принадлежности сотрудника
# Данные для каждого источника найма разместим в отдельном графике сетки
data=dfg_termreason_over_social
g = sns.catplot(x="racedesc",
col="termination_reason",
col_wrap=4,
data=data,
kind="count",
palette="Paired_r",
legend=False,
sharey=False
)
# Для каждого графика в сетке определим дополнительные параметры
for termination_reason, ax in g.axes_dict.items(): # перебираем параметры индивидуальных графиков
# Добавим заголовок каждого графика в виде аннотации, расположенной в пооле графика
ax.text(.95, .95, termination_reason, horizontalalignment='right', transform=ax.transAxes, fontweight="bold")
# Уберём индивидуальные заголовки графиков
g.set_titles("")
# Установим "плотное" расположение графиков
g.tight_layout()
g.fig.suptitle("Зависимость причин увольнения от расово-этнической принадлежности сотрудника",
fontsize=16, x=0.50, y=1.015)
# Определим размер подписи оси X
g.set_xlabels(fontsize=14)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=14)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=75,fontsize=14)
# Изменим интервал между индивидуальными графиками
plt.show()
ВЫВОД
# Построим сетку графиков для зависимости причин увольнения от семейного положения сотрудника
# Данные для каждого источника найма разместим в отдельном графике сетки
data=dfg_termreason_over_social
g = sns.catplot(x="maritaldesc",
col="termination_reason",
col_wrap=4,
data=data,
kind="count",
palette="Set3",
legend=False,
sharey=False
)
# Для каждого графика в сетке определим дополнительные параметры
for termination_reason, ax in g.axes_dict.items(): # перебираем параметры индивидуальных графиков
# Добавим заголовок каждого графика в виде аннотации, расположенной в пооле графика
ax.text(.95, .95, termination_reason, horizontalalignment='right', transform=ax.transAxes, fontweight="bold")
# Уберём индивидуальные заголовки графиков
g.set_titles("")
# Установим "плотное" расположение графиков
g.tight_layout()
g.fig.suptitle("Зависимость причин увольнения от семейного положения сотрудника",
fontsize=16, x=0.50, y=1.015)
# Определим размер подписи оси X
g.set_xlabels(fontsize=14)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=14)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=75,fontsize=14)
# Изменим интервал между индивидуальными графиками
plt.show()
ВЫВОД
ПРИМЕЧАНИЕ
Большая часть этих зависимостей аналогична зависимостям для сроков работы в компании. Поэтому ниже будут рассмотрены лишь отдельные взаимосвязи.
# Создадим DF для исследования распределения сотрудников компаниии по дате увольнения
# На временной шкале надо учесть и месяцы, в которых никто не был уволен.
sql_quiery = \
"""
-- создадим подзапросы для выборки из hr_dataset, а также для общей шкалы времени от минимальной даты увольнения
-- до максимальной (generate_sdries()). Объединим их на основе месяца с сохранением всех данных полной временной
-- шкалы
WITH
employee_selection AS
(SELECT
"Employee Number",
DATE_TRUNC('month', "Date of Termination") AS month, -- приведем даты увольнения к месяцу (начало месяца)
"Employment Status"
FROM
hr_dataset
WHERE -- Условие, что работники не действующие
"Employment Status" = 'Voluntarily Terminated' OR
"Employment Status" = 'Terminated for Cause'
),
month_of_termination AS
(SELECT
COUNT("Employee Number") AS employee_count,
month,
"Employment Status"
FROM
employee_selection
GROUP BY
month,
"Employment Status"
ORDER BY
month,
"Employment Status"
),
months AS
(SELECT
month
FROM generate_series(
(SELECT DATE_TRUNC('month', MIN("Date of Termination"))
FROM hr_dataset
WHERE "Employment Status" = 'Voluntarily Terminated' OR "Employment Status" = 'Terminated for Cause'
), -- выберем минимальную дату увольнения
(SELECT DATE_TRUNC('month', MAX("Date of Termination"))
FROM hr_dataset
WHERE "Employment Status" = 'Voluntarily Terminated' OR "Employment Status" = 'Terminated for Cause'
), -- выберем максимальную дату увольнения
'1 month') AS month -- выберем месячный интервал
)
SELECT * FROM
months
LEFT OUTER JOIN
month_of_termination
USING(month)
;
"""
df_date_term_term = pd.read_sql(sql_quiery, conn)
df_date_term_term
| month | employee_count | Employment Status | |
|---|---|---|---|
| 0 | 2010-07-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 1 | 2010-08-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 2 | 2010-09-01 00:00:00+00:00 | NaN | None |
| 3 | 2010-10-01 00:00:00+00:00 | NaN | None |
| 4 | 2010-11-01 00:00:00+00:00 | NaN | None |
| 5 | 2010-12-01 00:00:00+00:00 | NaN | None |
| 6 | 2011-01-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 7 | 2011-02-01 00:00:00+00:00 | NaN | None |
| 8 | 2011-03-01 00:00:00+00:00 | NaN | None |
| 9 | 2011-04-01 00:00:00+00:00 | NaN | None |
| 10 | 2011-05-01 00:00:00+00:00 | 3.0 | Voluntarily Terminated |
| 11 | 2011-06-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 12 | 2011-07-01 00:00:00+00:00 | NaN | None |
| 13 | 2011-08-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 14 | 2011-08-01 00:00:00+00:00 | 1.0 | Terminated for Cause |
| 15 | 2011-09-01 00:00:00+00:00 | 5.0 | Voluntarily Terminated |
| 16 | 2011-10-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 17 | 2011-11-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 18 | 2011-12-01 00:00:00+00:00 | NaN | None |
| 19 | 2012-01-01 00:00:00+00:00 | 3.0 | Voluntarily Terminated |
| 20 | 2012-02-01 00:00:00+00:00 | 2.0 | Voluntarily Terminated |
| 21 | 2012-03-01 00:00:00+00:00 | NaN | None |
| 22 | 2012-04-01 00:00:00+00:00 | 2.0 | Voluntarily Terminated |
| 23 | 2012-05-01 00:00:00+00:00 | NaN | None |
| 24 | 2012-06-01 00:00:00+00:00 | NaN | None |
| 25 | 2012-07-01 00:00:00+00:00 | 2.0 | Voluntarily Terminated |
| 26 | 2012-08-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 27 | 2012-09-01 00:00:00+00:00 | 4.0 | Voluntarily Terminated |
| 28 | 2012-09-01 00:00:00+00:00 | 1.0 | Terminated for Cause |
| 29 | 2012-10-01 00:00:00+00:00 | NaN | None |
| 30 | 2012-11-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 31 | 2012-12-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 32 | 2013-01-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 33 | 2013-02-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 34 | 2013-03-01 00:00:00+00:00 | NaN | None |
| 35 | 2013-04-01 00:00:00+00:00 | 4.0 | Voluntarily Terminated |
| 36 | 2013-05-01 00:00:00+00:00 | NaN | None |
| 37 | 2013-06-01 00:00:00+00:00 | 4.0 | Voluntarily Terminated |
| 38 | 2013-06-01 00:00:00+00:00 | 1.0 | Terminated for Cause |
| 39 | 2013-07-01 00:00:00+00:00 | NaN | None |
| 40 | 2013-08-01 00:00:00+00:00 | 2.0 | Voluntarily Terminated |
| 41 | 2013-09-01 00:00:00+00:00 | 2.0 | Voluntarily Terminated |
| 42 | 2013-10-01 00:00:00+00:00 | NaN | None |
| 43 | 2013-11-01 00:00:00+00:00 | NaN | None |
| 44 | 2013-12-01 00:00:00+00:00 | NaN | None |
| 45 | 2014-01-01 00:00:00+00:00 | 2.0 | Voluntarily Terminated |
| 46 | 2014-02-01 00:00:00+00:00 | 1.0 | Terminated for Cause |
| 47 | 2014-03-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 48 | 2014-04-01 00:00:00+00:00 | 4.0 | Voluntarily Terminated |
| 49 | 2014-05-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 50 | 2014-06-01 00:00:00+00:00 | NaN | None |
| 51 | 2014-07-01 00:00:00+00:00 | NaN | None |
| 52 | 2014-08-01 00:00:00+00:00 | 2.0 | Voluntarily Terminated |
| 53 | 2014-09-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 54 | 2014-09-01 00:00:00+00:00 | 1.0 | Terminated for Cause |
| 55 | 2014-10-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 56 | 2014-11-01 00:00:00+00:00 | NaN | None |
| 57 | 2014-12-01 00:00:00+00:00 | NaN | None |
| 58 | 2015-01-01 00:00:00+00:00 | NaN | None |
| 59 | 2015-02-01 00:00:00+00:00 | 1.0 | Terminated for Cause |
| 60 | 2015-03-01 00:00:00+00:00 | 1.0 | Terminated for Cause |
| 61 | 2015-04-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 62 | 2015-05-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 63 | 2015-06-01 00:00:00+00:00 | 5.0 | Voluntarily Terminated |
| 64 | 2015-07-01 00:00:00+00:00 | NaN | None |
| 65 | 2015-08-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 66 | 2015-09-01 00:00:00+00:00 | 4.0 | Voluntarily Terminated |
| 67 | 2015-09-01 00:00:00+00:00 | 2.0 | Terminated for Cause |
| 68 | 2015-10-01 00:00:00+00:00 | 2.0 | Voluntarily Terminated |
| 69 | 2015-11-01 00:00:00+00:00 | 6.0 | Voluntarily Terminated |
| 70 | 2015-12-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
| 71 | 2015-12-01 00:00:00+00:00 | 1.0 | Terminated for Cause |
| 72 | 2016-01-01 00:00:00+00:00 | 2.0 | Voluntarily Terminated |
| 73 | 2016-02-01 00:00:00+00:00 | 2.0 | Voluntarily Terminated |
| 74 | 2016-02-01 00:00:00+00:00 | 2.0 | Terminated for Cause |
| 75 | 2016-03-01 00:00:00+00:00 | NaN | None |
| 76 | 2016-04-01 00:00:00+00:00 | 2.0 | Voluntarily Terminated |
| 77 | 2016-05-01 00:00:00+00:00 | 3.0 | Voluntarily Terminated |
| 78 | 2016-05-01 00:00:00+00:00 | 2.0 | Terminated for Cause |
| 79 | 2016-06-01 00:00:00+00:00 | 1.0 | Voluntarily Terminated |
# Создадим график распределения уволенных и уволившихся сотрудников по месяцу и году увольнения в компанию
g=sns.relplot(data=df_date_term_term,
x="month",
y="employee_count",
hue="Employment Status",
height=4,
aspect=3.5,
palette="Set2",
alpha=0.9,
ci=False,
kind="line",
marker='o'
)
# Определим шкалу как временной диапазон в формате Pandas с квартальными интервалами (для читаемости):
scale_span = pd.date_range(start=df_date_term_term['month'].min(),
end= df_date_term_term['month'].max(),
freq='QS').tolist()
plt.xticks(ticks=scale_span, fontsize=12, rotation=90) # установим шкалу X и размер её обозначений
plt.yticks(ticks=list(range(0, 7, 1)), fontsize=14) # Установим шкалу Y и иразмер обозначения для шкалы Y
# Определим формат подписей шкалы как YYYY-MM
g.axes[0,0].xaxis.set_major_formatter(mpl.dates.DateFormatter('%Y-%m'))
g.set_xlabels("Months", fontsize=14) # Размер подписей шкалы X
g.set_ylabels("Employees terminated", fontsize=14) # Размер подписей шкалы Y
# Заголовок графика:
g.fig.suptitle("Общее распеределение уволенных и уволившихся сотрудников по месяцу и году увольнения",
fontsize=16, y=1.025)
plt.show()
ВЫВОД
# Создадим DF для исследования для анализа соответствия дат увольнения и дат найма и даты увольнения
sql_quiery = \
"""
SELECT
"Employee Number",
"Date of Hire",
"Date of Termination"
FROM
hr_dataset
WHERE
"Employment Status" = 'Voluntarily Terminated' OR
"Employment Status" = 'Terminated for Cause'
ORDER BY
"Date of Hire" DESC,
"Date of Termination"
;
"""
dfg_empldate_over_termdate = pd.read_sql(sql_quiery, conn)
#dfg_empldate_over_termdate
g=sns.relplot(
data=dfg_empldate_over_termdate,
x="Date of Hire",
y="Date of Termination",
height=8,
color="dodgerblue"
)
# Определим шкалу как временной диапазон в формате Pandas с квартальными интервалами (для читаемости):
scale_span = pd.date_range(start=dfg_empldate_over_termdate['Date of Hire'].min(),
end= dfg_empldate_over_termdate['Date of Termination'].max(),
freq='QS').tolist()
plt.xticks(ticks=scale_span, fontsize=8, rotation=90) # установим шкалу X и размер её обозначений
plt.yticks(ticks=scale_span, fontsize=8, rotation=0) # установим шкалу Y и размер её обозначений
# Обозначим отдельным цветом "критичные" части графика
g.axes[0,0].axvspan("2010-10", "2012-03", facecolor="gold", alpha=0.25)
g.axes[0,0].axhspan("2011-04", "2012-05", facecolor="deeppink", alpha=0.25)
g.axes[0,0].axhspan("2012-07", "2013-11", facecolor="deeppink", alpha=0.25)
g.axes[0,0].axhspan("2014-02", "2014-11", facecolor="deeppink", alpha=0.25)
g.axes[0,0].axhspan("2015-04", "2016-10", facecolor="deeppink", alpha=0.25)
# Заголовок графика:
g.fig.suptitle("Соотвествие дат увольнения и дат найма уволенных сотрудников", fontsize=16, x=0.5, y=1.025)
plt.show()
ВЫВОД
Анализ попарного соответствия дат увольнения и дат найма сотрудников уточняет выводы, полученный при анализе зависимость от этих показателей сроков работы в компании и показывает следующие тенденции.
# Создадим DF для исследования распределения сотрудников компаниии по дате увольнения от возраста на момент увольнения
# На временной шкале надо учесть и месяцы, в которых никто не был уволен.
sql_quiery = \
"""
-- создадим подзапросы для выборки из hr_dataset, а также для общей шкалы времени от минимальной даты увольнения
-- до максимальной (generate_series()). Объединим их на основе месяца с сохранением всех данных полной временной
-- шкалы
WITH
employee_selection AS
(SELECT
"Employee Number",
CAST(EXTRACT(year FROM AGE("Date of Termination", dob)) AS INTEGER) AS age, --приведем возраст на дату увольнения
DATE_TRUNC('month', "Date of Termination") AS month -- приведем даты увольнения к месяцу (начало месяца)
FROM
hr_dataset
),
months AS
(SELECT
month
FROM generate_series(
(SELECT DATE_TRUNC('month', MIN("Date of Termination"))
FROM hr_dataset), -- выберем минимальную дату увольнения
(SELECT DATE_TRUNC('month', MAX("Date of Termination"))
FROM hr_dataset), -- выберем максимальную дату увольнения
'1 month') AS month -- выберем месячный интервал
)
SELECT * FROM
months
LEFT OUTER JOIN
employee_selection
USING(month)
;
"""
dfg_date_term_over_age = pd.read_sql(sql_quiery, conn)
#dfg_date_term_over_age
# Построим график зависимости даты увольнения и возраста увольнения
g=sns.relplot(x="month",
y="age",
data=dfg_date_term_over_age,
kind='scatter',
height=5,
aspect=2.8
)
# Определим шкалу как временной диапазон в формате Pandas с квартальными интервалами (для читаемости):
scale_span = pd.date_range(start=dfg_date_term_over_age['month'].min(),
end= dfg_date_term_over_age['month'].max(),
freq='QS').tolist()
plt.xticks(ticks=scale_span, fontsize=12, rotation=90) # установим шкалу X и размер её обозначений
plt.yticks(ticks=list(range(15, 70, 5)), fontsize=14) # Установим шкалу Y и иразмер обозначения для шкалы Y
# Определим формат подписей шкалы как YYYY-MM
g.axes[0,0].xaxis.set_major_formatter(mpl.dates.DateFormatter('%Y-%m'))
g.set_xlabels("Months", fontsize=15) # Размер подписей шкалы X
g.set_ylabels("Age", fontsize=14) # Размер подписей шкалы Y
# Заголовок графика:
g.fig.suptitle("Зависимость даты увольнения и возраста сотрудников на дату увольнения",
fontsize=16, y=1.025)
# Обозначим отдельным цветом "критичные" части графика
g.axes[0,0].axvspan("2011-07", "2012-05", facecolor="orange", alpha=0.25)
g.axes[0,0].axvspan("2012-07", "2013-12", facecolor="orange", alpha=0.25)
g.axes[0,0].axvspan("2014-01", "2014-12", facecolor="orange", alpha=0.25)
g.axes[0,0].axvspan("2015-01", "2016-07", facecolor="orange", alpha=0.25)
g.axes[0,0].axhspan(20, 46, facecolor="deeppink", alpha=0.25)
plt.show()
ВЫВОД
# Создадим DF для исследования распределения сотрудников компаниии по дате увольнения в зависимости от пола
# На временной шкале надо учесть и месяцы, в которых никто не был принят на работу.
sql_quiery = \
"""
-- создадим подзапросы для выборки из hr_dataset, а также для общей шкалы времени от минимальной даты увольнения
-- до максимальной (generate_series()). Объединим их на основе месяца с сохранением всех данных полной временной
-- шкалы
WITH
employee_selection AS
(SELECT
"Employee Number",
sex,
DATE_TRUNC('month', "Date of Termination") AS month -- приведем даты увольнения к месяцу (начало месяца)
FROM
hr_dataset
),
months AS
(SELECT
month
FROM generate_series(
(SELECT DATE_TRUNC('month', MIN("Date of Termination"))
FROM hr_dataset), -- выберем минимальную дату увольнения
(SELECT DATE_TRUNC('month', MAX("Date of Termination"))
FROM hr_dataset), -- выберем максимальную дату увольнения
'1 month') AS month -- выберем месячный интервал
)
SELECT * FROM
months
LEFT OUTER JOIN
employee_selection
USING(month)
;
"""
dfg_date_term_over_sex = pd.read_sql(sql_quiery, conn)
#dfg_date_term_over_sex
# Построим график зависимости даты увольнения и пола сотрудника
g=sns.displot(x="month",
data=dfg_date_term_over_sex,
col="sex",
col_wrap=1,
hue="sex",
binwidth=91,
kde=True,
height=4,
aspect=2.8,
facet_kws=dict(sharex=True, sharey=True)
)
# Определим шкалу как временной диапазон в формате Pandas с квартальными интервалами (для читаемости):
scale_span = pd.date_range(start=dfg_date_term_over_sex['month'].min(),
end= dfg_date_term_over_sex['month'].max(),
freq='QS').tolist()
plt.xticks(ticks=scale_span, fontsize=12, rotation=90) # установим шкалу X и размер её обозначений
# Определим формат подписей шкалы как YYYY-MM
g.axes[1].xaxis.set_major_formatter(mpl.dates.DateFormatter('%Y-%m'))
g.set_xlabels("Months", fontsize=15) # Размер подписей шкалы X
g.set_ylabels(fontsize=14) # Размер подписей шкалы Y
# Заголовок графика:
g.fig.suptitle("Зависимость даты увольнения и половой принадлежности сотрудника",
fontsize=16, y=1.025)
plt.show()
ВЫВОД
# Создадим DF для исследования распределения сотрудников компаниии по дате увольнения в зависимости от
# расово-этнической принадлежности.
# На временной шкале надо учесть и месяцы, в которых никто не был уволен.
sql_quiery = \
"""
-- создадим подзапросы для выборки из hr_dataset, а также для общей шкалы времени от минимальной даты увольнения
-- до максимальной (generate_series()). Объединим их на основе месяца с сохранением всех данных полной временной
-- шкалы
WITH
employee_selection AS
(SELECT
"Employee Number",
racedesc,
DATE_TRUNC('month', "Date of Termination") AS month -- приведем даты увольнения к месяцу (начало месяца)
FROM
hr_dataset
),
months AS
(SELECT
month
FROM generate_series(
(SELECT DATE_TRUNC('month', MIN("Date of Termination"))
FROM hr_dataset), -- выберем минимальную дату увольнения
(SELECT DATE_TRUNC('month', MAX("Date of Termination"))
FROM hr_dataset), -- выберем максимальную дату увольнения
'1 month') AS month -- выберем месячный интервал
)
SELECT * FROM
months
LEFT OUTER JOIN
employee_selection
USING(month)
;
"""
dfg_date_term_over_racedesc = pd.read_sql(sql_quiery, conn)
#dfg_date_term_over_racedesc
# Построим график зависимости даты увольнения и расово-этнической принадлежности сотрудника
g=sns.displot(x="month",
data=dfg_date_term_over_racedesc,
col="racedesc",
col_wrap=1,
hue="racedesc",
binwidth=91,
kde=True,
height=4,
aspect=2.8,
facet_kws=dict(sharex=True, sharey=True)
)
# Определим шкалу как временной диапазон в формате Pandas с квартальными интервалами (для читаемости):
scale_span = pd.date_range(start=dfg_date_term_over_racedesc['month'].min(),
end= dfg_date_hire_over_sex['month'].max(),
freq='QS').tolist()
plt.xticks(ticks=scale_span, fontsize=12, rotation=90) # установим шкалу X и размер её обозначений
# Определим формат подписей шкалы как YYYY-MM
g.axes[1].xaxis.set_major_formatter(mpl.dates.DateFormatter('%Y-%m'))
g.set_xlabels("Months", fontsize=15) # Размер подписей шкалы X
g.set_ylabels(fontsize=14) # Размер подписей шкалы Y
# Заголовок графика:
g.fig.suptitle("Зависимость даты увольнения и расово-этнической принадлежности сотрудника",
fontsize=16, y=1.025)
plt.show()
ВЫВОД
# Создадим DF для исследования распределения сотрудников компаниии по дате увольнения в зависимости от
# семейного положения.
# На временной шкале надо учесть и месяцы, в которых никто не был уволен.
sql_quiery = \
"""
-- создадим подзапросы для выборки из hr_dataset, а также для общей шкалы времени от минимальной даты увольнения
-- до максимальной (generate_series()). Объединим их на основе месяца с сохранением всех данных полной временной
-- шкалы
WITH
employee_selection AS
(SELECT
"Employee Number",
maritaldesc,
DATE_TRUNC('month', "Date of Termination") AS month -- приведем даты увольнения к месяцу (начало месяца)
FROM
hr_dataset
),
months AS
(SELECT
month
FROM generate_series(
(SELECT DATE_TRUNC('month', MIN("Date of Termination"))
FROM hr_dataset), -- выберем минимальную дату увольнения
(SELECT DATE_TRUNC('month', MAX("Date of Termination"))
FROM hr_dataset), -- выберем максимальную дату увольнения
'1 month') AS month -- выберем месячный интервал
)
SELECT * FROM
months
LEFT OUTER JOIN
employee_selection
USING(month)
;
"""
dfg_date_term_over_maritaldesc = pd.read_sql(sql_quiery, conn)
#dfg_date_term_over_maritaldesc
# Построим график зависимости даты увольнения и семейного положения сотрудника
g=sns.displot(x="month",
data=dfg_date_term_over_maritaldesc,
col="maritaldesc",
col_wrap=1,
hue="maritaldesc",
binwidth=91,
kde=True,
height=4,
aspect=2.8,
facet_kws=dict(sharex=True, sharey=True)
)
# Определим шкалу как временной диапазон в формате Pandas с квартальными интервалами (для читаемости):
scale_span = pd.date_range(start=dfg_date_term_over_maritaldesc['month'].min(),
end= dfg_date_term_over_maritaldesc['month'].max(),
freq='QS').tolist()
plt.xticks(ticks=scale_span, fontsize=12, rotation=90) # установим шкалу X и размер её обозначений
# Определим формат подписей шкалы как YYYY-MM
g.axes[1].xaxis.set_major_formatter(mpl.dates.DateFormatter('%Y-%m'))
g.set_xlabels("Months", fontsize=15) # Размер подписей шкалы X
g.set_ylabels(fontsize=14) # Размер подписей шкалы Y
# Заголовок графика:
g.fig.suptitle("Зависимость даты увольнения и семейного положения сотрудника",
fontsize=16, y=1.025)
plt.show()
ВЫВОД
sql_quiery = \
"""
-- Создадим запросы и датафрейм с KPI
WITH
kpi_s AS
(SELECT
"Employee Name",
'Abutments/Hour Wk 1' AS "KPI_Name",
"Abutments/Hour Wk 1" AS "KPI_Value"
FROM
production_staff
UNION ALL
SELECT
"Employee Name",
'Abutments/Hour Wk 2' AS "KPI_Name",
"Abutments/Hour Wk 2" AS "KPI_Value"
FROM
production_staff
UNION ALL
SELECT
"Employee Name",
'Daily Error Rate' AS "KPI_Name",
"Daily Error Rate" AS "KPI_Value"
FROM
production_staff
UNION ALL
SELECT
"Employee Name",
'90-day Complaints' AS "KPI_Name",
"90-day Complaints" AS "KPI_Value"
FROM
production_staff
),
staff_list AS
(SELECT
"Employee Name",
"Position"
FROM
production_staff
WHERE -- Условие, что работники действующие и что средин нех менеджеров, для которых KPI не предусмотрены
("Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence') AND
"Position" <> 'Production Manager'
)
SELECT
*
FROM
kpi_s
RIGHT JOIN
staff_list
USING("Employee Name")
ORDER BY
"Position",
"KPI_Name"
"""
dfg_KPI_over_position = pd.read_sql(sql_quiery, conn)
#dfg_KPI_over_position
# Построим сетку графиков для зависимости KPI от должностей действующих сотрудников
# Данные для каждого источника найма разместим в отдельном графике сетки
kpi_list=["Abutments/Hour Wk 1", "Abutments/Hour Wk 2", "Daily Error Rate", "90-day Complaints"]
g = sns.displot(x="KPI_Value",
hue="Position",
col_order=kpi_list,
col="KPI_Name",
col_wrap=2,
data=dfg_KPI_over_position,
kind="hist",
palette="Paired",
legend=True,
binrange=(0, 22),
binwidth=1,
facet_kws=dict(sharey=False)
)
# Для каждого графика в сетке определим дополнительные параметры
for KPI_Name, ax in g.axes_dict.items(): # перебираем параметры индивидуальных графиков
# Добавим заголовок каждого графика в виде аннотации, расположенной в пооле графика
ax.text(.95, .95, KPI_Name, horizontalalignment='right', transform=ax.transAxes, fontweight="bold")
# Уберём индивидуальные заголовки графиков
g.set_titles("")
# Установим "плотное" расположение графиков
g.tight_layout()
g.fig.suptitle("Общее распределение KPI по должностям действующих сотрудников подразделения Production",
fontsize=16, x=0.50, y=1.015)
# Определим размер подписи оси X
g.set_xlabels(fontsize=14)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=14)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=75,fontsize=14)
# Изменим интервал между индивидуальными графиками
plt.show()
ВЫВОД
sql_quiery = \
"""
-- Создадим запросы и датафрейм с KPI
WITH
kpi_s AS
(SELECT
"Employee Name",
'Abutments/Hour Wk 1' AS "KPI_Name",
"Abutments/Hour Wk 1" AS "KPI_Value"
FROM
production_staff
UNION ALL
SELECT
"Employee Name",
'Abutments/Hour Wk 2' AS "KPI_Name",
"Abutments/Hour Wk 2" AS "KPI_Value"
FROM
production_staff
UNION ALL
SELECT
"Employee Name",
'Daily Error Rate' AS "KPI_Name",
"Daily Error Rate" AS "KPI_Value"
FROM
production_staff
UNION ALL
SELECT
"Employee Name",
'90-day Complaints' AS "KPI_Name",
"90-day Complaints" AS "KPI_Value"
FROM
production_staff
),
staff_list AS
(SELECT
"Employee Name",
ROUND("Days Employed"::numeric / 360, 1) AS years_employed
FROM
hr_dataset
WHERE -- Условие, что среди сотрудников нет менеджеров, для которых KPI не предусмотрены
TRIM(trailing from department) = 'Production' AND
position <> 'Production Manager'
-- Данные hr_dataset в поле department содержат ошибку в значении 'Production' (7 пробелов после слова)
)
SELECT
*
FROM
kpi_s
INNER JOIN
staff_list
USING("Employee Name")
ORDER BY
"KPI_Name",
years_employed
"""
dfg_KPI_over_years_employed = pd.read_sql(sql_quiery, conn).dropna()
#dfg_KPI_over_years_employed
# Построим сетку графиков для зависимости KPI от срокар работы в компании
# Данные для каждого источника найма разместим в отдельном графике сетки
kpi_list=["Abutments/Hour Wk 1", "Abutments/Hour Wk 2", "Daily Error Rate", "90-day Complaints"]
g = sns.lmplot(x="years_employed",
y="KPI_Value",
hue="KPI_Name",
col_order=kpi_list,
col="KPI_Name",
col_wrap=2,
data=dfg_KPI_over_years_employed,
palette="Set2",
facet_kws=dict(sharey=False)
)
# Для каждого графика в сетке определим дополнительные параметры
for KPI_Name, ax in g.axes_dict.items(): # перебираем параметры индивидуальных графиков
# Добавим заголовок каждого графика в виде аннотации, расположенной в пооле графика
ax.text(.95, .95, KPI_Name, horizontalalignment='right', transform=ax.transAxes, fontweight="bold")
# Уберём индивидуальные заголовки графиков
g.set_titles("")
# Установим "плотное" расположение графиков
g.tight_layout()
g.fig.suptitle(f"Зависимость KPI от сроков работы в компании сотрудников подразделения Production, \n"
f"включая уволенных (уволившихся) - линейнеая регрессия",
fontsize=16, x=0.50, y=1.05)
# Определим размер подписи оси X
g.set_xlabels(fontsize=14)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=14)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=75,fontsize=14)
# Изменим интервал между индивидуальными графиками
plt.show()
ВЫВОД
Анализ зависимость KPI от срока работы в компании (для сотрудников подразделения Production) показывает улучшение KPI с увеличением срока работы в компании:
sql_quiery = \
"""
-- Создадим запросы и датафрейм с KPI
WITH
kpi_s AS
(SELECT
"Employee Name",
'Abutments/Hour Wk 1' AS "KPI_Name",
"Abutments/Hour Wk 1" AS "KPI_Value"
FROM
production_staff
UNION ALL
SELECT
"Employee Name",
'Abutments/Hour Wk 2' AS "KPI_Name",
"Abutments/Hour Wk 2" AS "KPI_Value"
FROM
production_staff
UNION ALL
SELECT
"Employee Name",
'Daily Error Rate' AS "KPI_Name",
"Daily Error Rate" AS "KPI_Value"
FROM
production_staff
UNION ALL
SELECT
"Employee Name",
'90-day Complaints' AS "KPI_Name",
"90-day Complaints" AS "KPI_Value"
FROM
production_staff
),
staff_list AS
(SELECT
"Employee Name",
"Employee Source"
FROM
hr_dataset
WHERE -- Условие, что среди сотрудников нет менеджеров, для которых KPI не предусмотрены
TRIM(trailing from department) = 'Production' AND
position <> 'Production Manager'
-- Данные hr_dataset в поле department содержат ошибку в значении 'Production' (7 пробелов после слова)
)
SELECT
*
FROM
kpi_s
INNER JOIN
staff_list
USING("Employee Name")
ORDER BY
"KPI_Name",
"Employee Source"
"""
dfg_KPI_over_empl_source = pd.read_sql(sql_quiery, conn).dropna()
#dfg_KPI_over_empl_source
# Построим сетку графиков для зависимости KPI от источника найма
# Данные для каждого источника найма разместим в отдельном графике сетки
kpi_list=["Abutments/Hour Wk 1", "Abutments/Hour Wk 2", "Daily Error Rate", "90-day Complaints"]
g = sns.catplot(x="Employee Source",
y="KPI_Value",
hue="Employee Source",
col_order=kpi_list,
col="KPI_Name",
col_wrap=1,
data=dfg_KPI_over_empl_source,
palette="tab20b",
sharey=False,
dodge=False,
height=4,
aspect=3,
kind="box"
)
# Для каждого графика в сетке определим дополнительные параметры
for KPI_Name, ax in g.axes_dict.items(): # перебираем параметры индивидуальных графиков
# Добавим заголовок каждого графика в виде аннотации, расположенной в пооле графика
ax.text(.95, .95, KPI_Name, horizontalalignment='right', transform=ax.transAxes, fontweight="bold")
# Уберём индивидуальные заголовки графиков
g.set_titles("")
# Установим "плотное" расположение графиков
g.tight_layout()
g.fig.suptitle(f"Зависимость KPI от источников найма для сотрудников подразделения Production, \n"
f"включая уволенных (уволившихся)",
fontsize=16, x=0.50, y=1.05)
# Определим размер подписи оси X
g.set_xlabels(fontsize=14)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=14)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=90,fontsize=12)
plt.show()
ВЫВОД
sql_quiery = \
"""
-- Создадим запросы и датафрейм с KPI
WITH
kpi_s AS
(SELECT
"Employee Name",
'Abutments/Hour Wk 1' AS "KPI_Name",
"Abutments/Hour Wk 1" AS "KPI_Value"
FROM
production_staff
UNION ALL
SELECT
"Employee Name",
'Abutments/Hour Wk 2' AS "KPI_Name",
"Abutments/Hour Wk 2" AS "KPI_Value"
FROM
production_staff
UNION ALL
SELECT
"Employee Name",
'Daily Error Rate' AS "KPI_Name",
"Daily Error Rate" AS "KPI_Value"
FROM
production_staff
UNION ALL
SELECT
"Employee Name",
'90-day Complaints' AS "KPI_Name",
"90-day Complaints" AS "KPI_Value"
FROM
production_staff
),
staff_list AS
(SELECT
"Employee Name",
DATE_TRUNC('month', "Date of Hire") AS month
FROM
production_staff
WHERE -- Условие, что среди сотрудников нет менеджеров, для которых KPI не предусмотрены
"Position" <> 'Production Manager'
),
months AS
(SELECT
month
FROM generate_series(
(SELECT DATE_TRUNC('month', MIN("Date of Hire"))
FROM production_staff), -- выберем минимальную дату
(SELECT DATE_TRUNC('month', MAX("Date of Hire")+ interval '3 months')
FROM production_staff), -- выберем максимальную дату
'1 month') AS month -- выберем месячный интервал
)
SELECT
month,
"KPI_Name",
AVG("KPI_Value") AS avg_kpi
FROM
months
LEFT OUTER JOIN
(kpi_s
RIGHT JOIN
staff_list
USING("Employee Name"))
USING(month)
GROUP BY
"KPI_Name",
month
ORDER BY
"KPI_Name",
month
;
"""
dfg_KPI_over_date_hire = pd.read_sql(sql_quiery, conn).dropna()
#dfg_KPI_over_date_hire
# Построим сетку графиков для зависимости средних величин KPI от годов найма (для всех сотрудников Production)
# Данные для каждого источника найма разместим в отдельном графике сетки
kpi_list=["Abutments/Hour Wk 1", "Abutments/Hour Wk 2", "Daily Error Rate", "90-day Complaints"]
g = sns.relplot(x="month",
y="avg_kpi",
data=dfg_KPI_over_date_hire,
hue="KPI_Name",
hue_order=kpi_list,
palette="Dark2",
kind='line',
linewidth=2,
height=5,
aspect=2.8
)
g.fig.suptitle("Зависимость средних величин KPI от месяца и года найма для всех сотрудников подразделения Production",
fontsize=16, x=0.50, y=1.015)
# Определим шкалу X как временной диапазон в формате Pandas с квартальными интервалами (для читаемости):
scale_span = pd.date_range(start=dfg_KPI_over_date_hire['month'].min(),
end= dfg_KPI_over_date_hire['month'].max(),
freq='QS').tolist()
plt.xticks(ticks=scale_span, fontsize=12, rotation=90) # установим шкалу X и размер её обозначений
# Определим формат подписей шкалы как YYYY-MM
g.axes[0,0].xaxis.set_major_formatter(mpl.dates.DateFormatter('%Y-%m'))
# Определим размер подписи оси X
g.set_xlabels("months", fontsize=12)
# Определим размер подписи оси Y
g.set_ylabels("average KPI values", fontsize=12)
# Определим размер и поворот подписей под столбцами
plt.show()
ВЫВОД
sql_quiery = \
"""
-- Создадим запросы и датафрейм с KPI
WITH
kpi_s AS
(SELECT
"Employee Name",
'Abutments/Hour Wk 1' AS "KPI_Name",
"Abutments/Hour Wk 1" AS "KPI_Value"
FROM
production_staff
UNION ALL
SELECT
"Employee Name",
'Abutments/Hour Wk 2' AS "KPI_Name",
"Abutments/Hour Wk 2" AS "KPI_Value"
FROM
production_staff
UNION ALL
SELECT
"Employee Name",
'Daily Error Rate' AS "KPI_Name",
"Daily Error Rate" AS "KPI_Value"
FROM
production_staff
UNION ALL
SELECT
"Employee Name",
'90-day Complaints' AS "KPI_Name",
"90-day Complaints" AS "KPI_Value"
FROM
production_staff
),
staff_list AS
(SELECT
"Employee Name",
DATE_TRUNC('month', "TermDate") AS month
FROM
production_staff
WHERE -- Условие, что среди сотрудников нет менеджеров, для которых KPI не предусмотрены
"Position" <> 'Production Manager' AND
("Employment Status" = 'Voluntarily Terminated' OR
"Employment Status" = 'Terminated for Cause')
),
months AS
(SELECT
month
FROM generate_series(
(SELECT DATE_TRUNC('month', MIN("TermDate"))
FROM production_staff), -- выберем минимальную дату
(SELECT DATE_TRUNC('month', MAX("TermDate")+ interval '3 months')
FROM production_staff), -- выберем максимальную дату
'1 month') AS month -- выберем месячный интервал
)
SELECT
month,
"KPI_Name",
AVG("KPI_Value") AS avg_kpi
FROM
months
LEFT OUTER JOIN
(kpi_s
RIGHT JOIN
staff_list
USING("Employee Name"))
USING(month)
GROUP BY
"KPI_Name",
month
ORDER BY
"KPI_Name",
month
;
"""
dfg_KPI_over_date_term = pd.read_sql(sql_quiery, conn).dropna()
#dfg_KPI_over_date_term
# Построим сетку графиков для зависимости средних величин KPI от годов найма (для всех сотрудников Production)
# Данные для каждого источника найма разместим в отдельном графике сетки
kpi_list=["Abutments/Hour Wk 1", "Abutments/Hour Wk 2", "Daily Error Rate", "90-day Complaints"]
g = sns.relplot(x="month",
y="avg_kpi",
data=dfg_KPI_over_date_term,
hue="KPI_Name",
hue_order=kpi_list,
palette="Dark2",
kind='line',
linewidth=2,
height=5,
aspect=2.8
)
g.fig.suptitle("Зависимость средних величин KPI от месяца и года увольнения сотрудников подразделения Production",
fontsize=16, x=0.50, y=1.015)
# Определим шкалу X как временной диапазон в формате Pandas с квартальными интервалами (для читаемости):
scale_span = pd.date_range(start=dfg_KPI_over_date_term['month'].min(),
end= dfg_KPI_over_date_term['month'].max(),
freq='QS').tolist()
plt.xticks(ticks=scale_span, fontsize=12, rotation=90) # установим шкалу X и размер её обозначений
# Определим формат подписей шкалы как YYYY-MM
g.axes[0,0].xaxis.set_major_formatter(mpl.dates.DateFormatter('%Y-%m'))
# Определим размер подписи оси X
g.set_xlabels("months", fontsize=12)
# Определим размер подписи оси Y
g.set_ylabels("average KPI values", fontsize=12)
plt.show()
ВЫВОД
sql_quiery = \
"""
-- Создадим запросы и датафрейм с KPI
WITH
kpi_s AS
(SELECT
"Employee Name",
'Abutments/Hour Wk 1' AS "KPI_Name",
"Abutments/Hour Wk 1" AS "KPI_Value"
FROM
production_staff
UNION ALL
SELECT
"Employee Name",
'Abutments/Hour Wk 2' AS "KPI_Name",
"Abutments/Hour Wk 2" AS "KPI_Value"
FROM
production_staff
UNION ALL
SELECT
"Employee Name",
'Daily Error Rate' AS "KPI_Name",
"Daily Error Rate" AS "KPI_Value"
FROM
production_staff
UNION ALL
SELECT
"Employee Name",
'90-day Complaints' AS "KPI_Name",
"90-day Complaints" AS "KPI_Value"
FROM
production_staff
),
staff_list AS
(SELECT
"Employee Name",
"Employee Source"
FROM
hr_dataset
WHERE -- Условие, что среди сотрудников нет менеджеров, для которых KPI не предусмотрены
position <> 'Production Manager'
),
kpi_and_sources_list AS
(SELECT
"Employee Name",
"Employee Source" AS empl_source,
"KPI_Name",
"KPI_Value"
FROM
kpi_s
RIGHT JOIN
staff_list
USING("Employee Name")
),
empl_source_count AS
(SELECT
COUNT("Employee Name") AS empl_count,
"Employee Source" AS empl_source
FROM
hr_dataset
WHERE -- Условие, что среди сотрудников нет менеджеров, для которых KPI не предусмотрены
TRIM(trailing from department) = 'Production' AND
position <> 'Production Manager'
-- Данные hr_dataset в поле department содержат ошибку в значении 'Production' (7 пробелов после слова)
GROUP BY
"Employee Source"
),
empl_source_price AS
(SELECT
"Employment Source" AS empl_source,
"Total"
FROM
recruiting_costs
),
empl_count_per_source_price AS
(SELECT
*
FROM
(empl_source_count
LEFT JOIN
empl_source_price
USING (empl_source)
)
),
per_employee_price AS
(SELECT
empl_source,
ROUND("Total" / (SUM(empl_count) OVER (PARTITION BY empl_source)), 2) AS "cost_per_employee"
FROM
empl_count_per_source_price
ORDER BY
"cost_per_employee" DESC
)
SELECT
"Employee Name",
cost_per_employee,
"KPI_Name",
"KPI_Value"
FROM
kpi_and_sources_list
INNER JOIN
per_employee_price
USING(empl_source)
ORDER BY
"KPI_Name",
empl_source
;
"""
dfg_KPI_over_empl_cost = pd.read_sql(sql_quiery, conn).dropna()
#dfg_KPI_over_empl_cost
# Cоздадим график зависимости KPI и условной стоимости найма одного сотрудника
kpi_list=["Abutments/Hour Wk 1", "Abutments/Hour Wk 2", "Daily Error Rate", "90-day Complaints"]
g=sns.jointplot(
# Для отсечения выбросов предел стоимости найма можно изменить:
data=dfg_KPI_over_empl_cost[dfg_KPI_over_empl_cost.cost_per_employee < 8000],
x="cost_per_employee",
y="KPI_Value",
hue="KPI_Name",
hue_order=kpi_list,
height=6
)
g.fig.suptitle(f"УСЛОВНО: Зависимость KPI от стоимости найма одного сотрудника подразделения Production, \n"
f"включая уволенных (уволившихся)",
fontsize=16, x=0.50, y=1.05)
plt.show()
ВЫВОД
sql_quiery = \
"""
-- Создадим запросы и датафрейм с KPI и возрастом сотрудника (приведем данные в формат long-data)
WITH
KPI_S AS
(SELECT
"Employee Name",
"Position",
'Abutments/Hour Wk 1' AS "KPI_Name",
"Abutments/Hour Wk 1" AS "KPI_Value"
FROM
production_staff
UNION ALL
SELECT
"Employee Name",
"Position",
'Abutments/Hour Wk 2' AS "KPI_Name",
"Abutments/Hour Wk 2" AS "KPI_Value"
FROM
production_staff
UNION ALL
SELECT
"Employee Name",
"Position",
'Daily Error Rate' AS "KPI_Name",
"Daily Error Rate" AS "KPI_Value"
FROM
production_staff
UNION ALL
SELECT
"Employee Name",
"Position",
'90-day Complaints' AS "KPI_Name",
"90-day Complaints" AS "KPI_Value"
FROM
production_staff
),
EmplAge AS
(SELECT
"Employee Name",
"Employee Number",
age,
sex,
racedesc,
maritaldesc
FROM
hr_dataset
WHERE
rtrim(department, ' ') = 'Production'
)
SELECT
*
FROM
KPI_S
INNER JOIN
EmplAge
USING ("Employee Name")
WHERE "Position"<>'Production Manager'
ORDER BY
"Position",
age,
"Employee Name",
"KPI_Name"
;
"""
dfg_kpi_over_social = pd.read_sql(sql_quiery, conn)
#dfg_kpi_over_social
# Построим графики зависимость KPI от возраста сотрудников.
# Список KPIs:
kpi_list=["Abutments/Hour Wk 1", "Abutments/Hour Wk 2", "Daily Error Rate", "90-day Complaints"]
# Выведем график зависимости возраста и KPI
g=sns.lmplot(data=dfg_kpi_over_social,
x="age",
y="KPI_Value",
hue="Position",
col="KPI_Name",
col_order=kpi_list,
col_wrap=2,
palette="tab10",
height=4,
aspect=1,
facet_kws=dict(sharex=True, sharey=False)
)
g.fig.suptitle("Зависимость KPI от возраста сотрудника подразделения Production (линейная регрессия)", fontsize=16, y=1.05)
# Определим размер подписи оси X
g.set_xlabels(fontsize=10)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=10)
fig.legend(fontsize=12)
plt.show()
ВЫВОД
У сотрудников на должности Production Technician I показатели KPI имеют тенденцию ухудшаться с возрастом. У сотрудников на должности Production Technician II, наоборот, наблюдается тенденция к улучшению показателей с возрастом. Тем не менее, доверительные интервалы увеличиваются с возрастом и являются достаточно широкими.
# Построим графики зависимость KPI от пола сотрудников.
# Список KPIs:
kpi_list=["Abutments/Hour Wk 1", "Abutments/Hour Wk 2", "Daily Error Rate", "90-day Complaints"]
# Выведем график зависимости возраста и KPI
g=sns.catplot(data=dfg_kpi_over_social,
x="Position",
y="KPI_Value",
hue="sex",
col="KPI_Name",
col_order=kpi_list,
col_wrap=2,
palette="Pastel1",
height=4,
aspect=1,
kind="boxen",
facet_kws=dict(sharex=True, sharey=False)
)
g.fig.suptitle("Зависимость KPI от пола сотрудника подразделения Production", fontsize=16, y=1.05)
# Определим размер подписи оси X
g.set_xlabels(fontsize=10)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=10)
fig.legend(fontsize=12)
plt.show()
ВЫВОД
# Построим графики зависимость KPI от пола сотрудников.
# Список KPIs:
kpi_list=["Abutments/Hour Wk 1", "Abutments/Hour Wk 2", "Daily Error Rate", "90-day Complaints"]
# Выведем график зависимости возраста и KPI
g=sns.catplot(data=dfg_kpi_over_social,
x="Position",
y="KPI_Value",
hue="racedesc",
col="KPI_Name",
col_order=kpi_list,
col_wrap=2,
palette="Set1_r",
height=4,
aspect=1.2,
kind="boxen",
facet_kws=dict(sharex=True, sharey=False)
)
g.fig.suptitle("Зависимость KPI от расово-этнической принадлежности сотрудников подразделения Production",
fontsize=16, y=1.05)
# Определим размер подписи оси X
g.set_xlabels(fontsize=10)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=10)
fig.legend(fontsize=12)
plt.show()
ВЫВОД
# Построим графики зависимость KPI от пола сотрудников.
# Список KPIs:
kpi_list=["Abutments/Hour Wk 1", "Abutments/Hour Wk 2", "Daily Error Rate", "90-day Complaints"]
# Выведем график зависимости возраста и KPI
g=sns.catplot(data=dfg_kpi_over_social,
x="Position",
y="KPI_Value",
hue="maritaldesc",
col="KPI_Name",
col_order=kpi_list,
col_wrap=2,
palette="Accent",
height=4,
aspect=1.2,
kind="boxen",
facet_kws=dict(sharex=True, sharey=False)
)
g.fig.suptitle("Зависимость KPI от семейного положения сотрудников подразделения Production",
fontsize=16, y=1.05)
# Определим размер подписи оси X
g.set_xlabels(fontsize=10)
# Определим размер подписи оси Y
g.set_ylabels(fontsize=10)
fig.legend(fontsize=12)
plt.show()
ВЫВОД
ПРИМЕЧАНИЕ
- При условии, что суммарные затраты на наём по каждому источнику найма, указанному в recruiting_costs, являются общей суммарной величиной за весь период данных.
- В recruiting_costs отсутствует источник найма Indeed. Затраты по найму через него приняты равными нулю.
sql_quiery = \
"""
-- Создадим временное представление (оно многократно потребуется нам далее): затраты на наём одного сотрудника
CREATE OR REPLACE TEMPORARY VIEW
costs_per_employee AS --по каждому источнику удельные затраты на наём одного сотрудника
WITH
empl_source_count AS --количество сотрудников по каждому источнику найма
(SELECT
COUNT("Employee Name") AS empl_count,
"Employee Source" AS empl_source
FROM
hr_dataset
GROUP BY
"Employee Source"
),
empl_source_price AS --суммарные затраты по каждому истонику найма
(SELECT
"Employment Source" AS empl_source,
"Total"
FROM
recruiting_costs
),
empl_count_per_source_price AS --совмещение 2 таблиц: по каждому источнику суммарные затраты и кол-во сотрудников
(SELECT
empl_source,
CASE WHEN "Total" IS NOT NULL
THEN "Total"
ELSE '0'
END AS "Total",
empl_count
FROM
(empl_source_count
LEFT JOIN
empl_source_price
USING (empl_source)
)
)
--Результирующее врем.предствление: по каждому источнику удельные затраты на наём одного сотрудника.
SELECT
empl_source,
ROUND("Total" / (SUM(empl_count) OVER (PARTITION BY empl_source)), 2) AS empl_cost
FROM
empl_count_per_source_price
ORDER BY
empl_cost DESC
;
WITH
employee_selection AS --выборка: кол-во сотрудников по статусу занятости внутри каждого источника найма
(SELECT
COUNT("Employee Number") AS empl_count,
"Employee Source" AS empl_source,
"Employment Status"
FROM
hr_dataset
GROUP BY
"Employee Source",
"Employment Status"
ORDER BY
"Employment Status"
),
status_selection AS --выборка: количество сотрудников по статусам занятости
(SELECT
"Employment Status",
COUNT("Employee Number") AS status_empl_count
FROM
hr_dataset
GROUP BY
"Employment Status"
),
abs_recruit_costs_per_status AS --абсолютные затраты на наём сотрудников по каждому статусу занятости
(SELECT
"Employment Status",
ROUND(SUM(empl_count * empl_cost),0)::INTEGER AS total_recruting_costs
FROM
employee_selection
LEFT JOIN
costs_per_employee
USING(empl_source)
GROUP BY
"Employment Status"
ORDER BY
total_recruting_costs DESC
)
--итог: по каждому статусу занятости: кол-во сотрудников, абсолютные затраты на наём, средние затраты на наём 1 человека
(SELECT
"Employment Status",
status_empl_count::INTEGER AS "Employee Count",
total_recruting_costs,
total_recruting_costs / status_empl_count AS avg_recruiting_cost
FROM
abs_recruit_costs_per_status --абсолютные затрат на наём по каждому статусу
LEFT JOIN
status_selection --кол-во сотрудников по каждому статусу
USING("Employment Status")
ORDER BY
total_recruting_costs DESC
)
UNION ALL
(SELECT --суммирующая строка
'[TOTAL]' AS "Employment Status",
SUM(status_empl_count)::INTEGER AS "Employee Count",
SUM(total_recruting_costs) AS total_recruting_costs,
(SUM(total_recruting_costs) / SUM(status_empl_count))::INTEGER AS avg_recruiting_cost
FROM
abs_recruit_costs_per_status
LEFT JOIN
status_selection
USING("Employment Status")
)
;
"""
df_recruit_costs_per_status = pd.read_sql(sql_quiery, conn)
df_recruit_costs_per_status
| Employment Status | Employee Count | total_recruting_costs | avg_recruiting_cost | |
|---|---|---|---|---|
| 0 | Active | 183 | 51162 | 279 |
| 1 | Voluntarily Terminated | 88 | 24110 | 273 |
| 2 | Leave of Absence | 14 | 3749 | 267 |
| 3 | Future Start | 11 | 2885 | 262 |
| 4 | Terminated for Cause | 14 | 2553 | 182 |
| 5 | [TOTAL] | 310 | 84459 | 272 |
ВЫВОД
sql_quiery = \
"""
-- Используем временное представление, созданное ранее: по каждому источнику удельные затраты на наём одного сотрудника
WITH
employee_selection AS --выборка: кол-во сотрудников по каждому департаменту внутри источника найма
(SELECT
COUNT("Employee Number") AS empl_count,
"Employee Source" AS empl_source,
department
FROM
hr_dataset
GROUP BY
"Employee Source",
department
ORDER BY
department
),
department_selection AS --выборка: кол-во сотрудников по каждому департаменту
(SELECT
department,
COUNT("Employee Number") AS dptmnt_empl_count
FROM
hr_dataset
GROUP BY
department
),
abs_recruit_costs_per_dptmnt AS --абсолютные затраты на наём персонала по каждому департаменту
(SELECT
department,
ROUND(SUM(empl_count * empl_cost),0)::INTEGER AS total_recruting_costs
FROM
employee_selection --кол-во сотрудников по каждому департаменту внутри источника найма
LEFT JOIN
costs_per_employee --по каждому источнику удельные затраты на наём одного сотрудника
USING(empl_source)
GROUP BY
department
ORDER BY
total_recruting_costs DESC
)
--итог: по каждому департаменту: кол-во сотрудников, абсолютные затраты на наём, средние затраты на наём 1 человека
(SELECT
department,
dptmnt_empl_count::INTEGER AS "Employee Count",
total_recruting_costs,
total_recruting_costs / dptmnt_empl_count AS avg_recruiting_cost
FROM
abs_recruit_costs_per_dptmnt --абсолютные затраты на наём персонала по каждому департаменту
LEFT JOIN
department_selection --кол-во сотрудников по каждому департаменту
USING(department)
ORDER BY
total_recruting_costs DESC
)
UNION ALL
(SELECT --итоговая суммирующая строка
'[TOTAL]' AS department,
SUM(dptmnt_empl_count)::INTEGER AS "Employee Count",
SUM(total_recruting_costs) AS total_recruting_costs,
(SUM(total_recruting_costs) / SUM(dptmnt_empl_count))::INTEGER AS avg_recruiting_cost
FROM
abs_recruit_costs_per_dptmnt
LEFT JOIN
department_selection
USING(department)
)
;
"""
df_recruit_costs_per_dptmnt = pd.read_sql(sql_quiery, conn)
df_recruit_costs_per_dptmnt
| department | Employee Count | total_recruting_costs | avg_recruiting_cost | |
|---|---|---|---|---|
| 0 | Production | 208 | 64805 | 311 |
| 1 | Sales | 31 | 10341 | 333 |
| 2 | Admin Offices | 10 | 3332 | 333 |
| 3 | IT/IS | 50 | 3102 | 62 |
| 4 | Software Engineering | 10 | 2713 | 271 |
| 5 | Executive Office | 1 | 167 | 167 |
| 6 | [TOTAL] | 310 | 84460 | 272 |
ВЫВОД
sql_quiery = \
"""
-- Используем временное представление, созданное ранее: по каждому источнику удельные затраты на наём одного сотрудника
WITH
employee_selection AS --выборка: кол-во сотрудников по каждой должности внутри каждого департамента
--внутри статуса занятости внутри источника найма
(SELECT
COUNT("Employee Number") AS empl_count,
"Employee Source" AS empl_source,
"Employment Status",
department,
position
FROM
hr_dataset
GROUP BY
"Employee Source",
"Employment Status",
department,
position
ORDER BY
"Employment Status",
department,
position
),
position_selection AS --выборка: кол-во сотрудников по каждой должности внутри статуса занятости
(SELECT
"Employment Status",
position,
COUNT("Employee Number") AS pstn_empl_count
FROM
hr_dataset
GROUP BY
"Employment Status",
position
),
abs_recruit_costs_per_pstn_in_dptmnt AS --абсолютные затраты на наём персонала по каждой должности внутри департамента
--внутри статуса занятости
(SELECT
department,
"Employment Status",
position,
ROUND(SUM(empl_count * empl_cost),0)::INTEGER AS total_recruiting_costs
FROM
employee_selection --кол-во сотрудников по каждой должности внутри каждого департамента внутри статуса
LEFT JOIN
costs_per_employee --по каждому источнику удельные затраты на наём одного сотрудника
USING(empl_source)
GROUP BY
department,
"Employment Status",
position
ORDER BY
total_recruiting_costs DESC
)
--итог: по каждой дложности: кол-во сотрудников, абсолютные затраты на наём, средние затраты на наём 1 человека
(SELECT
department,
position,
"Employment Status",
pstn_empl_count::INTEGER AS "Employee Count",
total_recruiting_costs::INTEGER,
(total_recruiting_costs / pstn_empl_count)::INTEGER AS avg_recruiting_cost
FROM
abs_recruit_costs_per_pstn_in_dptmnt --абсолютные затраты на наём персонала по каждой должности внутри департамента
LEFT JOIN
position_selection --кол-во сотрудников по каждой должности
USING(position, "Employment Status")
ORDER BY
department,
position,
"Employment Status",
total_recruiting_costs DESC
)
UNION ALL
(SELECT --итоговая суммирующая строка
'[TOTAL]' AS department,
'[control]' AS position,
'[control]' AS "Employment Status",
SUM(pstn_empl_count)::INTEGER AS "Employee Count",
SUM(total_recruiting_costs)::INTEGER AS total_recruiting_costs,
(SUM(total_recruiting_costs) / SUM(pstn_empl_count))::INTEGER AS avg_recruiting_cost
FROM
abs_recruit_costs_per_pstn_in_dptmnt
LEFT JOIN
position_selection
USING(position, "Employment Status")
)
;
"""
dfg_recruit_costs_per_dptmnt_pos = pd.read_sql(sql_quiery, conn).fillna(0)
dfg_recruit_costs_per_dptmnt_pos
| department | position | Employment Status | Employee Count | total_recruiting_costs | avg_recruiting_cost | |
|---|---|---|---|---|---|---|
| 0 | Admin Offices | Accountant I | Active | 3 | 895 | 298 |
| 1 | Admin Offices | Administrative Assistant | Active | 2 | 717 | 358 |
| 2 | Admin Offices | Administrative Assistant | Voluntarily Terminated | 1 | 346 | 346 |
| 3 | Admin Offices | Shared Services Manager | Active | 1 | 240 | 240 |
| 4 | Admin Offices | Shared Services Manager | Voluntarily Terminated | 1 | 346 | 346 |
| 5 | Admin Offices | Sr. Accountant | Active | 2 | 789 | 394 |
| 6 | Executive Office | President & CEO | Active | 1 | 167 | 167 |
| 7 | IT/IS | BI Developer | Active | 4 | 0 | 0 |
| 8 | IT/IS | BI Director | Active | 1 | 60 | 60 |
| 9 | IT/IS | CIO | Active | 1 | 0 | 0 |
| 10 | IT/IS | Data Architect | Active | 1 | 0 | 0 |
| 11 | IT/IS | Database Administrator | Active | 7 | 513 | 73 |
| 12 | IT/IS | Database Administrator | Leave of Absence | 1 | 0 | 0 |
| 13 | IT/IS | Database Administrator | Terminated for Cause | 3 | 207 | 69 |
| 14 | IT/IS | Database Administrator | Voluntarily Terminated | 2 | 0 | 0 |
| 15 | IT/IS | IT Director | Active | 1 | 60 | 60 |
| 16 | IT/IS | IT Manager - DB | Active | 1 | 60 | 60 |
| 17 | IT/IS | IT Manager - DB | Voluntarily Terminated | 1 | 346 | 346 |
| 18 | IT/IS | IT Manager - Infra | Active | 1 | 346 | 346 |
| 19 | IT/IS | IT Manager - Support | Active | 1 | 60 | 60 |
| 20 | IT/IS | IT Support | Active | 4 | 346 | 86 |
| 21 | IT/IS | Network Engineer | Active | 8 | 240 | 30 |
| 22 | IT/IS | Network Engineer | Voluntarily Terminated | 1 | 240 | 240 |
| 23 | IT/IS | Senior BI Developer | Active | 3 | 0 | 0 |
| 24 | IT/IS | Sr. DBA | Future Start | 1 | 0 | 0 |
| 25 | IT/IS | Sr. DBA | Terminated for Cause | 1 | 0 | 0 |
| 26 | IT/IS | Sr. DBA | Voluntarily Terminated | 2 | 0 | 0 |
| 27 | IT/IS | Sr. Network Engineer | Active | 2 | 0 | 0 |
| 28 | IT/IS | Sr. Network Engineer | Future Start | 1 | 625 | 625 |
| 29 | IT/IS | Sr. Network Engineer | Leave of Absence | 2 | 0 | 0 |
| 30 | Production | Director of Operations | Active | 1 | 444 | 444 |
| 31 | Production | Production Manager | Active | 9 | 1551 | 172 |
| 32 | Production | Production Manager | Terminated for Cause | 1 | 0 | 0 |
| 33 | Production | Production Manager | Voluntarily Terminated | 4 | 1176 | 294 |
| 34 | Production | Production Technician I | Active | 73 | 21571 | 295 |
| 35 | Production | Production Technician I | Future Start | 4 | 1267 | 316 |
| 36 | Production | Production Technician I | Leave of Absence | 7 | 3344 | 477 |
| 37 | Production | Production Technician I | Terminated for Cause | 7 | 1719 | 245 |
| 38 | Production | Production Technician I | Voluntarily Terminated | 45 | 13393 | 297 |
| 39 | Production | Production Technician II | Active | 23 | 13391 | 582 |
| 40 | Production | Production Technician II | Future Start | 4 | 444 | 111 |
| 41 | Production | Production Technician II | Leave of Absence | 4 | 406 | 101 |
| 42 | Production | Production Technician II | Voluntarily Terminated | 26 | 6100 | 234 |
| 43 | Sales | Area Sales Manager | Active | 23 | 6920 | 300 |
| 44 | Sales | Area Sales Manager | Future Start | 1 | 549 | 549 |
| 45 | Sales | Area Sales Manager | Terminated for Cause | 1 | 240 | 240 |
| 46 | Sales | Area Sales Manager | Voluntarily Terminated | 2 | 967 | 483 |
| 47 | Sales | Director of Sales | Active | 1 | 646 | 646 |
| 48 | Sales | Sales Manager | Active | 2 | 513 | 256 |
| 49 | Sales | Sales Manager | Voluntarily Terminated | 1 | 507 | 507 |
| 50 | Software Engineering | Software Engineer | Active | 6 | 1427 | 237 |
| 51 | Software Engineering | Software Engineer | Terminated for Cause | 1 | 387 | 387 |
| 52 | Software Engineering | Software Engineer | Voluntarily Terminated | 2 | 691 | 345 |
| 53 | Software Engineering | Software Engineering Manager | Active | 1 | 207 | 207 |
| 54 | [TOTAL] | [control] | [control] | 310 | 84463 | 272 |
# Построим сетку графиков зависимость общих расходов на наём сотрудников по кажой должности
# в разрезе департаментов и статусов занятости
g=sns.catplot(x="position",
y="total_recruiting_costs",
col="department",
col_wrap=2,
hue="Employment Status",
hue_order=["Active","Leave of Absence", "Future Start", "Voluntarily Terminated", "Terminated for Cause"],
data=dfg_recruit_costs_per_dptmnt_pos.iloc[0:-1], # кроме последнего контрольного ряда
kind='bar',
height=5,
aspect=1.5,
sharex=False,
sharey=False,
dodge=True,
ci=False
)
g.fig.suptitle("Затраты на наём персонала в разрезе должностей, департаментов и статусов заянтости, долларов США",
fontsize=16, x=0.50, y=1.025)
# Определим размер подписи оси X
g.set_xlabels("Position", fontsize=14)
# Определим размер подписи оси Y
g.set_ylabels("Recruiting costs per position", fontsize=14)
# Определим размер и поворот подписей под столбцами
g.set_xticklabels(rotation=90,fontsize=12)
# Изменим интервал между индивидуальными графиками
g.fig.subplots_adjust(hspace=0.7)
g1=sns.catplot(x="position",
y="avg_recruiting_cost",
col="department",
col_wrap=2,
hue="Employment Status",
hue_order=["Active","Leave of Absence", "Future Start", "Voluntarily Terminated", "Terminated for Cause"],
data=dfg_recruit_costs_per_dptmnt_pos.iloc[0:-1], # кроме последнего контрольного ряда
kind='bar',
height=5,
aspect=1.5,
sharex=False,
sharey=True,
dodge=True,
ci=False
)
g1.fig.suptitle(f"Затраты на наём одного сотрудника в разрезе должностей, департаментов и статусов заянтости, \n"
f"долларов США/чел.",
fontsize=16, x=0.50, y=1.025)
# Определим размер подписи оси X
g1.set_xlabels("Position", fontsize=14)
# Определим размер подписи оси Y
g1.set_ylabels("Recruiting costs per position", fontsize=14)
# Определим размер и поворот подписей под столбцами
g1.set_xticklabels(rotation=90,fontsize=12)
# Изменим интервал между индивидуальными графиками
g1.fig.subplots_adjust(hspace=0.7)
plt.show()
ВЫВОД
Суммарные затраты на наём персонала в разрезе должностей и статусов занятости:
Средние затраты на наём одного человека в разрезе должностей и статусов занятости:
sql_quiery = \
"""
-- Используем временное представление, созданное ранее: по каждому источнику удельные затраты на наём одного сотрудника
WITH
employee_selection AS --выборка: кол-во сотрудников по каждой дате найма внутри источника найма
(SELECT
COUNT("Employee Number") AS empl_count,
"Employee Source" AS empl_source,
DATE_TRUNC('month', "Date of Hire") AS empl_date --приведём дату к месяцу
FROM
hr_dataset
GROUP BY
"Employee Source",
DATE_TRUNC('month', "Date of Hire")
ORDER BY
DATE_TRUNC('month', "Date of Hire")
),
empl_date_selection AS --выборка: кол-во сотрудников по каждой дате найма
(SELECT
DATE_TRUNC('month', "Date of Hire") AS empl_date, --приведём дату к месяцу
COUNT("Employee Number") AS empl_date_empl_count
FROM
hr_dataset
GROUP BY
DATE_TRUNC('month', "Date of Hire")
),
abs_recruit_costs_per_empl_date AS --абсолютные затраты на наём персонала по каждой дате найма
(SELECT
empl_date,
ROUND(SUM(empl_count * empl_cost),0)::INTEGER AS total_recruting_costs
FROM
employee_selection --кол-во сотрудников по каждой дате найма внутри источника найма
LEFT JOIN
costs_per_employee --по каждому источнику удельные затраты на наём одного сотрудника
USING(empl_source)
GROUP BY
empl_date
ORDER BY
total_recruting_costs DESC
)
--итог: по каждому дате найма: кол-во сотрудников, абсолютные затраты на наём, средние затраты на наём 1 человека
(SELECT
empl_date,
empl_date_empl_count AS "Employee Count",
total_recruting_costs,
total_recruting_costs / empl_date_empl_count AS avg_recruiting_cost
FROM
abs_recruit_costs_per_empl_date --абсолютные затраты на наём персонала по каждой дате найма
LEFT JOIN
empl_date_selection --кол-во сотрудников по каждой дате найма
USING(empl_date)
ORDER BY
empl_date
)
UNION ALL
(SELECT --итоговая суммирующая строка
make_date(2017, 11, 30) AS empl_date,
SUM(empl_date_empl_count)::INTEGER AS "Employee Count",
SUM(total_recruting_costs) AS total_recruting_costs,
(SUM(total_recruting_costs) / SUM(empl_date_empl_count))::INTEGER AS avg_recruiting_cost
FROM
abs_recruit_costs_per_empl_date
LEFT JOIN
empl_date_selection
USING(empl_date)
)
;
"""
dfg_recruit_costs_per_empl_date = pd.read_sql(sql_quiery, conn, index_col="empl_date")
dfg_recruit_costs_per_empl_date
| Employee Count | total_recruting_costs | avg_recruiting_cost | |
|---|---|---|---|
| empl_date | |||
| 2006-01-01 00:00:00+00:00 | 1 | 387 | 387 |
| 2007-06-01 00:00:00+00:00 | 1 | 167 | 167 |
| 2007-11-01 00:00:00+00:00 | 1 | 207 | 207 |
| 2008-01-01 00:00:00+00:00 | 1 | 387 | 387 |
| 2008-09-01 00:00:00+00:00 | 1 | 207 | 207 |
| 2008-10-01 00:00:00+00:00 | 1 | 346 | 346 |
| 2009-01-01 00:00:00+00:00 | 4 | 1275 | 318 |
| 2009-04-01 00:00:00+00:00 | 1 | 346 | 346 |
| 2009-07-01 00:00:00+00:00 | 1 | 346 | 346 |
| 2009-10-01 00:00:00+00:00 | 1 | 240 | 240 |
| 2010-04-01 00:00:00+00:00 | 3 | 553 | 184 |
| 2010-05-01 00:00:00+00:00 | 1 | 0 | 0 |
| 2010-07-01 00:00:00+00:00 | 1 | 240 | 240 |
| 2010-08-01 00:00:00+00:00 | 2 | 684 | 342 |
| 2010-09-01 00:00:00+00:00 | 1 | 60 | 60 |
| 2010-10-01 00:00:00+00:00 | 1 | 0 | 0 |
| 2011-01-01 00:00:00+00:00 | 15 | 3164 | 210 |
| 2011-02-01 00:00:00+00:00 | 6 | 2250 | 375 |
| 2011-03-01 00:00:00+00:00 | 2 | 806 | 403 |
| 2011-04-01 00:00:00+00:00 | 8 | 8919 | 1114 |
| 2011-05-01 00:00:00+00:00 | 12 | 4507 | 375 |
| 2011-06-01 00:00:00+00:00 | 2 | 444 | 222 |
| 2011-07-01 00:00:00+00:00 | 13 | 4483 | 344 |
| 2011-08-01 00:00:00+00:00 | 5 | 1360 | 272 |
| 2011-09-01 00:00:00+00:00 | 10 | 2812 | 281 |
| 2011-10-01 00:00:00+00:00 | 1 | 167 | 167 |
| 2011-11-01 00:00:00+00:00 | 10 | 2719 | 271 |
| 2012-01-01 00:00:00+00:00 | 8 | 3219 | 402 |
| 2012-02-01 00:00:00+00:00 | 5 | 2249 | 449 |
| 2012-03-01 00:00:00+00:00 | 2 | 549 | 274 |
| 2012-04-01 00:00:00+00:00 | 10 | 2679 | 267 |
| 2012-05-01 00:00:00+00:00 | 4 | 668 | 167 |
| 2012-07-01 00:00:00+00:00 | 4 | 1661 | 415 |
| 2012-08-01 00:00:00+00:00 | 4 | 747 | 186 |
| 2012-09-01 00:00:00+00:00 | 4 | 1710 | 427 |
| 2012-10-01 00:00:00+00:00 | 1 | 0 | 0 |
| 2012-11-01 00:00:00+00:00 | 2 | 628 | 314 |
| 2013-01-01 00:00:00+00:00 | 5 | 527 | 105 |
| 2013-02-01 00:00:00+00:00 | 2 | 733 | 366 |
| 2013-04-01 00:00:00+00:00 | 3 | 991 | 330 |
| 2013-05-01 00:00:00+00:00 | 3 | 1658 | 552 |
| 2013-07-01 00:00:00+00:00 | 9 | 1959 | 217 |
| 2013-08-01 00:00:00+00:00 | 6 | 1700 | 283 |
| 2013-09-01 00:00:00+00:00 | 9 | 2323 | 258 |
| 2013-11-01 00:00:00+00:00 | 7 | 1806 | 258 |
| 2014-01-01 00:00:00+00:00 | 6 | 2245 | 374 |
| 2014-02-01 00:00:00+00:00 | 7 | 2416 | 345 |
| 2014-03-01 00:00:00+00:00 | 3 | 706 | 235 |
| 2014-05-01 00:00:00+00:00 | 10 | 3040 | 304 |
| 2014-07-01 00:00:00+00:00 | 9 | 1500 | 166 |
| 2014-08-01 00:00:00+00:00 | 3 | 554 | 184 |
| 2014-09-01 00:00:00+00:00 | 13 | 2592 | 199 |
| 2014-11-01 00:00:00+00:00 | 8 | 991 | 123 |
| 2014-12-01 00:00:00+00:00 | 1 | 0 | 0 |
| 2015-01-01 00:00:00+00:00 | 11 | 1176 | 106 |
| 2015-02-01 00:00:00+00:00 | 8 | 1469 | 183 |
| 2015-03-01 00:00:00+00:00 | 12 | 928 | 77 |
| 2015-05-01 00:00:00+00:00 | 2 | 609 | 304 |
| 2015-06-01 00:00:00+00:00 | 2 | 444 | 222 |
| 2015-07-01 00:00:00+00:00 | 1 | 461 | 461 |
| 2016-01-01 00:00:00+00:00 | 2 | 407 | 203 |
| 2016-05-01 00:00:00+00:00 | 1 | 0 | 0 |
| 2016-06-01 00:00:00+00:00 | 3 | 625 | 208 |
| 2016-07-01 00:00:00+00:00 | 5 | 1356 | 271 |
| 2016-09-01 00:00:00+00:00 | 1 | 60 | 60 |
| 2016-10-01 00:00:00+00:00 | 2 | 0 | 0 |
| 2017-01-01 00:00:00+00:00 | 1 | 0 | 0 |
| 2017-02-01 00:00:00+00:00 | 3 | 0 | 0 |
| 2017-04-01 00:00:00+00:00 | 2 | 0 | 0 |
| 2017-11-30 00:00:00+00:00 | 310 | 84462 | 272 |
# Построим график для зависимости абсолютной и средней стоимости найма сотрудников от годов найма
# Из полученного выше DF выберем нужные столбцы и отсечём последнюю контрольную строку
data=dfg_recruit_costs_per_empl_date[["total_recruting_costs", "avg_recruiting_cost"]].iloc[0:-1]
# Не указывам ни X, ни Y, тогда seaborn "понимает" что данные представлены в формате wide-data
g = sns.relplot(data=data,
palette="Set1",
kind='line',
linewidth=2,
height=7,
aspect=2.2,
marker='o'
)
g.fig.suptitle("Зависимость абсоолютных и средних затрат на наём сотрудников от даты (месяца) найма",
fontsize=16, x=0.50, y=1.015)
# Определим шкалу X как временной диапазон в формате Pandas с квартальными интервалами (для читаемости):
scale_span = pd.date_range(start=data.index.min(),
end= data.index.max(),
freq='QS').tolist()
plt.xticks(ticks=scale_span, fontsize=12, rotation=90) # установим шкалу X и размер её обозначений
# Определим формат подписей шкалы как YYYY-MM
g.axes[0,0].xaxis.set_major_formatter(mpl.dates.DateFormatter('%Y-%m'))
# Определим размер подписи оси X
g.set_xlabels("Date of Hire", fontsize=12)
# Определим размер подписи оси Y
g.set_ylabels("USD", fontsize=12)
#g.ax.set_yscale('log') # для читаемости установим логарифмическую шкалу
plt.show()
ВЫВОД
Построенный график показывает:
sql_quiery = \
"""
-- Используем временное представление, созданное ранее: по каждому источнику удельные затраты на наём одного сотрудника
WITH
employee_selection AS --выборка: кол-во сотрудников по каждой дате увольнения внутри источника найма
(SELECT
COUNT("Employee Number") AS empl_count,
"Employee Source" AS empl_source,
DATE_TRUNC('month', "Date of Termination") AS term_date --приведём дату к месяцу
FROM
hr_dataset
WHERE
"Employment Status" = 'Voluntarily Terminated' OR
"Employment Status" = 'Terminated for Cause'
GROUP BY
"Employee Source",
DATE_TRUNC('month', "Date of Termination")
ORDER BY
DATE_TRUNC('month', "Date of Termination")
),
term_date_selection AS --выборка: кол-во сотрудников по каждой дате увольнения
(SELECT
DATE_TRUNC('month', "Date of Termination") AS term_date, --приведём дату к месяцу
COUNT("Employee Number") AS term_date_empl_count
FROM
hr_dataset
WHERE
"Employment Status" = 'Voluntarily Terminated' OR
"Employment Status" = 'Terminated for Cause'
GROUP BY
DATE_TRUNC('month', "Date of Termination")
),
abs_recruit_costs_per_term_date AS --абсолютные затраты на наём персонала по каждой дате увольнения
(SELECT
term_date,
ROUND(SUM(empl_count * empl_cost),0)::INTEGER AS total_recruting_costs
FROM
employee_selection --кол-во сотрудников по каждой дате увольнения внутри источника найма
LEFT JOIN
costs_per_employee --по каждому источнику удельные затраты на наём одного сотрудника
USING(empl_source)
GROUP BY
term_date
ORDER BY
total_recruting_costs DESC
)
--итог: по каждому дате увольнения: кол-во сотрудников, абсолютные затраты на наём, средние затраты на наём 1 человека
(SELECT
term_date,
term_date_empl_count AS "Employee Count",
total_recruting_costs,
total_recruting_costs / term_date_empl_count AS avg_recruiting_cost
FROM
abs_recruit_costs_per_term_date --абсолютные затраты на наём персонала по каждой дате увольнения
LEFT JOIN
term_date_selection --кол-во сотрудников по каждой дате увольнения
USING(term_date)
ORDER BY
term_date
)
UNION ALL
(SELECT --итоговая суммирующая строка
make_date(2017, 11, 30) AS term_date,
SUM(term_date_empl_count)::INTEGER AS "Employee Count",
SUM(total_recruting_costs) AS total_recruting_costs,
(SUM(total_recruting_costs) / SUM(term_date_empl_count))::INTEGER AS avg_recruiting_cost
FROM
abs_recruit_costs_per_term_date
LEFT JOIN
term_date_selection
USING(term_date)
)
;
"""
dfg_recruit_costs_per_term_date = pd.read_sql(sql_quiery, conn, index_col="term_date")
dfg_recruit_costs_per_term_date
| Employee Count | total_recruting_costs | avg_recruiting_cost | |
|---|---|---|---|
| term_date | |||
| 2010-07-01 00:00:00+00:00 | 1 | 0 | 0 |
| 2010-08-01 00:00:00+00:00 | 1 | 167 | 167 |
| 2011-01-01 00:00:00+00:00 | 1 | 346 | 346 |
| 2011-05-01 00:00:00+00:00 | 3 | 1313 | 437 |
| 2011-06-01 00:00:00+00:00 | 1 | 507 | 507 |
| 2011-08-01 00:00:00+00:00 | 2 | 967 | 483 |
| 2011-09-01 00:00:00+00:00 | 5 | 1071 | 214 |
| 2011-10-01 00:00:00+00:00 | 1 | 207 | 207 |
| 2011-11-01 00:00:00+00:00 | 1 | 346 | 346 |
| 2012-01-01 00:00:00+00:00 | 3 | 553 | 184 |
| 2012-02-01 00:00:00+00:00 | 2 | 167 | 83 |
| 2012-04-01 00:00:00+00:00 | 2 | 167 | 83 |
| 2012-07-01 00:00:00+00:00 | 2 | 207 | 103 |
| 2012-08-01 00:00:00+00:00 | 1 | 207 | 207 |
| 2012-09-01 00:00:00+00:00 | 5 | 3219 | 643 |
| 2012-11-01 00:00:00+00:00 | 1 | 207 | 207 |
| 2012-12-01 00:00:00+00:00 | 1 | 207 | 207 |
| 2013-01-01 00:00:00+00:00 | 1 | 0 | 0 |
| 2013-02-01 00:00:00+00:00 | 1 | 240 | 240 |
| 2013-04-01 00:00:00+00:00 | 4 | 553 | 138 |
| 2013-06-01 00:00:00+00:00 | 5 | 1695 | 339 |
| 2013-08-01 00:00:00+00:00 | 2 | 701 | 350 |
| 2013-09-01 00:00:00+00:00 | 2 | 971 | 485 |
| 2014-01-01 00:00:00+00:00 | 2 | 747 | 373 |
| 2014-02-01 00:00:00+00:00 | 1 | 0 | 0 |
| 2014-03-01 00:00:00+00:00 | 1 | 0 | 0 |
| 2014-04-01 00:00:00+00:00 | 4 | 1405 | 351 |
| 2014-05-01 00:00:00+00:00 | 1 | 207 | 207 |
| 2014-08-01 00:00:00+00:00 | 2 | 894 | 447 |
| 2014-09-01 00:00:00+00:00 | 2 | 646 | 323 |
| 2014-10-01 00:00:00+00:00 | 1 | 461 | 461 |
| 2015-02-01 00:00:00+00:00 | 1 | 0 | 0 |
| 2015-03-01 00:00:00+00:00 | 1 | 0 | 0 |
| 2015-04-01 00:00:00+00:00 | 1 | 240 | 240 |
| 2015-05-01 00:00:00+00:00 | 1 | 240 | 240 |
| 2015-06-01 00:00:00+00:00 | 5 | 1931 | 386 |
| 2015-08-01 00:00:00+00:00 | 1 | 346 | 346 |
| 2015-09-01 00:00:00+00:00 | 6 | 1790 | 298 |
| 2015-10-01 00:00:00+00:00 | 2 | 646 | 323 |
| 2015-11-01 00:00:00+00:00 | 6 | 1688 | 281 |
| 2015-12-01 00:00:00+00:00 | 2 | 240 | 120 |
| 2016-01-01 00:00:00+00:00 | 2 | 60 | 30 |
| 2016-02-01 00:00:00+00:00 | 4 | 567 | 141 |
| 2016-04-01 00:00:00+00:00 | 2 | 480 | 240 |
| 2016-05-01 00:00:00+00:00 | 5 | 60 | 12 |
| 2016-06-01 00:00:00+00:00 | 1 | 0 | 0 |
| 2017-11-30 00:00:00+00:00 | 102 | 26666 | 261 |
# Построим график для зависимости абсолютной и средней стоимости найма сотрудников от годов найма
# Из полученного выше DF выберем нужные столбцы и отсечём последнюю контрольную строку
data=dfg_recruit_costs_per_term_date[["total_recruting_costs", "avg_recruiting_cost"]].iloc[0:-1]
# Не указывам ни X, ни Y, тогда seaborn "понимает" что данные представлены в формате wide-data
g = sns.relplot(data=data,
palette="Set1",
kind='line',
linewidth=2,
height=7,
aspect=2.2,
marker='o'
)
g.fig.suptitle("Зависимость абсоолютных и средних затрат на наём сотрудников и даты (месяца) увольнения",
fontsize=16, x=0.50, y=1.015)
# Определим шкалу X как временной диапазон в формате Pandas с квартальными интервалами (для читаемости):
scale_span = pd.date_range(start=data.index.min(),
end= data.index.max(),
freq='QS').tolist()
plt.xticks(ticks=scale_span, fontsize=12, rotation=90) # установим шкалу X и размер её обозначений
# Определим формат подписей шкалы как YYYY-MM
g.axes[0,0].xaxis.set_major_formatter(mpl.dates.DateFormatter('%Y-%m'))
# Определим размер подписи оси X
g.set_xlabels("Date of Termination", fontsize=12)
# Определим размер подписи оси Y
g.set_ylabels("USD", fontsize=12)
#g.ax.set_yscale('log') # для читаемости установим логарифмическую шкалу
plt.show()
ВЫВОД
sql_quiery = \
"""
-- Используем временное представление, созданное ранее: по каждому источнику удельные затраты на наём одного сотрудника
WITH
employee_selection AS --выборка: кол-во сотрудников по каждому возрасту найма внутри источника найма
(SELECT
COUNT("Employee Number") AS empl_count,
"Employee Source" AS empl_source,
--приведём разницу между датой найма и датой рождения к годам:
CAST(EXTRACT(year FROM AGE("Date of Hire", dob)) AS INTEGER) AS age_of_hire
FROM
hr_dataset
GROUP BY
"Employee Source",
CAST(EXTRACT(year FROM AGE("Date of Hire", dob)) AS INTEGER)
ORDER BY
CAST(EXTRACT(year FROM AGE("Date of Hire", dob)) AS INTEGER)
),
age_of_hire_selection AS --выборка: кол-во сотрудников по каждому возрасту найма
(SELECT
--приведём разницу между датой найма и датой рождения к годам:
CAST(EXTRACT(year FROM AGE("Date of Hire", dob)) AS INTEGER) AS age_of_hire,
COUNT("Employee Number") AS age_of_hire_empl_count
FROM
hr_dataset
GROUP BY
CAST(EXTRACT(year FROM AGE("Date of Hire", dob)) AS INTEGER)
),
abs_recruit_costs_per_age_of_hire AS --абсолютные затраты на наём персонала по каждому возрасту найма
(SELECT
age_of_hire,
ROUND(SUM(empl_count * empl_cost),0)::INTEGER AS total_recruting_costs
FROM
employee_selection --кол-во сотрудников по каждому возрасту найма внутри источника найма
LEFT JOIN
costs_per_employee --по каждому источнику удельные затраты на наём одного сотрудника
USING(empl_source)
GROUP BY
age_of_hire
ORDER BY
total_recruting_costs DESC
)
--итог: по каждому возрасту найма: кол-во сотрудников, абсолютные затраты на наём, средние затраты на наём 1 человека
(SELECT
age_of_hire,
age_of_hire_empl_count AS "Employee Count",
total_recruting_costs,
total_recruting_costs / age_of_hire_empl_count AS avg_recruiting_cost
FROM
abs_recruit_costs_per_age_of_hire --абсолютные затраты на наём персонала по каждму возрасту найма
LEFT JOIN
age_of_hire_selection --кол-во сотрудников по каждму возрасту найма
USING(age_of_hire)
ORDER BY
age_of_hire
)
UNION ALL
(SELECT --итоговая суммирующая строка
'0' AS age_of_hire,
SUM(age_of_hire_empl_count)::INTEGER AS "Employee Count",
SUM(total_recruting_costs) AS total_recruting_costs,
(SUM(total_recruting_costs) / SUM(age_of_hire_empl_count))::INTEGER AS avg_recruiting_cost
FROM
abs_recruit_costs_per_age_of_hire
LEFT JOIN
age_of_hire_selection
USING(age_of_hire)
)
;
"""
dfg_recruit_costs_per_age = pd.read_sql(sql_quiery, conn)
dfg_recruit_costs_per_age
| age_of_hire | Employee Count | total_recruting_costs | avg_recruiting_cost | |
|---|---|---|---|---|
| 0 | 19 | 3 | 654 | 218 |
| 1 | 20 | 3 | 1029 | 343 |
| 2 | 21 | 7 | 1254 | 179 |
| 3 | 22 | 7 | 2412 | 344 |
| 4 | 23 | 8 | 2932 | 366 |
| 5 | 24 | 15 | 4123 | 274 |
| 6 | 25 | 14 | 3519 | 251 |
| 7 | 26 | 17 | 4875 | 286 |
| 8 | 27 | 17 | 4270 | 251 |
| 9 | 28 | 15 | 2719 | 181 |
| 10 | 29 | 18 | 4454 | 247 |
| 11 | 30 | 14 | 1849 | 132 |
| 12 | 31 | 13 | 4012 | 308 |
| 13 | 32 | 11 | 2792 | 253 |
| 14 | 33 | 17 | 11212 | 659 |
| 15 | 34 | 12 | 3402 | 283 |
| 16 | 35 | 10 | 646 | 64 |
| 17 | 36 | 9 | 3053 | 339 |
| 18 | 37 | 8 | 2339 | 292 |
| 19 | 38 | 12 | 2836 | 236 |
| 20 | 39 | 10 | 3452 | 345 |
| 21 | 40 | 8 | 3164 | 395 |
| 22 | 41 | 6 | 691 | 115 |
| 23 | 42 | 6 | 1779 | 296 |
| 24 | 43 | 8 | 1774 | 221 |
| 25 | 44 | 6 | 673 | 112 |
| 26 | 45 | 5 | 1628 | 325 |
| 27 | 46 | 2 | 387 | 193 |
| 28 | 47 | 3 | 567 | 189 |
| 29 | 48 | 5 | 1577 | 315 |
| 30 | 49 | 4 | 865 | 216 |
| 31 | 50 | 2 | 167 | 83 |
| 32 | 51 | 1 | 387 | 387 |
| 33 | 52 | 2 | 240 | 120 |
| 34 | 54 | 2 | 447 | 223 |
| 35 | 55 | 1 | 346 | 346 |
| 36 | 56 | 1 | 207 | 207 |
| 37 | 57 | 2 | 374 | 187 |
| 38 | 59 | 2 | 480 | 240 |
| 39 | 60 | 2 | 668 | 334 |
| 40 | 62 | 1 | 0 | 0 |
| 41 | 63 | 1 | 207 | 207 |
| 42 | 0 | 310 | 84462 | 272 |
# Построим совмещённый график для зависимости стоииости найма от возраста сотрудников при найме
fig, ax = plt.subplots(figsize=(15, 5), ncols=1, nrows=1)
data=dfg_recruit_costs_per_age.iloc[:-1] # уберём контрольную нижнюю строку из DF
sns.lineplot(data=data,
x="age_of_hire",
y="total_recruting_costs",
color='salmon',
linewidth=2,
marker='o',
ax=ax
)
ax1 = ax.twinx()
sns.lineplot(data=data,
x="age_of_hire",
y="avg_recruiting_cost",
color='cadetblue',
linewidth=2,
marker="o",
ax=ax1
)
# Определим заголовок графика
fig.suptitle("Зависимость абсоолютных и средних затрат на наём сотрудников от возраста сотрудников на момент найма",
fontsize=16, x=0.50, y=1.02)
# Определим легенду графика
line, = ax.plot([1], label='Total costs', color='salmon')
line1, = ax1.plot([1],label='Average costs', color='cadetblue')
plt.figlegend(handles=[line, line1], bbox_to_anchor=(0.25, 0.475, 0.5, 0.5), loc='upper center', ncol=2)
# Определяем шкалу X:
scale_step = 5 # Определяем шаг шкалы X
# Определяем основной диапазон шкалы
scale_span = list(range((int(data.age_of_hire.min() / scale_step) * scale_step),
int(data.age_of_hire.max() + scale_step),
scale_step))
ax.set_xlim(scale_span[0],scale_span[-1]) # установим границы шкалы X и размер её обозначений
plt.xticks(ticks=scale_span, fontsize=14, rotation=0) # установим шкалу X и размер её обозначений
plt.show()
ВЫВОД
sql_quiery = \
"""
-- Используем временное представление, созданное ранее: по каждому источнику удельные затраты на наём одного сотрудника
WITH
employee_selection AS --выборка: кол-во сотрудников по полу внутри источника найма
(SELECT
COUNT("Employee Number") AS empl_count,
"Employee Source" AS empl_source,
sex
FROM
hr_dataset
GROUP BY
"Employee Source",
sex
ORDER BY
sex
),
sex_selection AS --выборка: кол-во сотрудников по полу
(SELECT
sex,
COUNT("Employee Number") AS sex_empl_count
FROM
hr_dataset
GROUP BY
sex
),
abs_recruit_costs_per_sex AS --абсолютные затраты на наём персонала в разрезе пола
(SELECT
sex,
ROUND(SUM(empl_count * empl_cost),0)::INTEGER AS total_recruting_costs
FROM
employee_selection --кол-во сотрудников по полу внутри источника найма
LEFT JOIN
costs_per_employee --по каждому источнику удельные затраты на наём одного сотрудника
USING(empl_source)
GROUP BY
sex
ORDER BY
total_recruting_costs DESC
)
--итог: по каждому полу: кол-во сотрудников, абсолютные затраты на наём, средние затраты на наём 1 человека
(SELECT
sex,
sex_empl_count AS "Employee Count",
total_recruting_costs,
total_recruting_costs / sex_empl_count AS avg_recruiting_cost
FROM
abs_recruit_costs_per_sex --абсолютные затраты на наём персонала в разрезе пола
LEFT JOIN
sex_selection --кол-во сотрудников по каждму полу
USING(sex)
ORDER BY
sex
)
UNION ALL
(SELECT --итоговая суммирующая строка
'[TOTAL]' AS sex,
SUM(sex_empl_count)::INTEGER AS "Employee Count",
SUM(total_recruting_costs) AS total_recruting_costs,
(SUM(total_recruting_costs) / SUM(sex_empl_count))::INTEGER AS avg_recruiting_cost
FROM
abs_recruit_costs_per_sex
LEFT JOIN
sex_selection
USING(sex)
)
;
"""
df_recruit_costs_per_sex = pd.read_sql(sql_quiery, conn)
df_recruit_costs_per_sex
| sex | Employee Count | total_recruting_costs | avg_recruiting_cost | |
|---|---|---|---|---|
| 0 | Female | 177 | 44846 | 253 |
| 1 | Male | 133 | 39614 | 297 |
| 2 | [TOTAL] | 310 | 84460 | 272 |
ВЫВОД
sql_quiery = \
"""
-- Используем временное представление, созданное ранее: по каждому источнику удельные затраты на наём одного сотрудника
WITH
employee_selection AS --выборка: кол-во сотрудников по расово-этнической принадлжености внутри источника найма
(SELECT
COUNT("Employee Number") AS empl_count,
"Employee Source" AS empl_source,
racedesc
FROM
hr_dataset
GROUP BY
"Employee Source",
racedesc
ORDER BY
racedesc
),
racedesc_selection AS --выборка: кол-во сотрудников по расово-этнической принадлжености
(SELECT
racedesc,
COUNT("Employee Number") AS racedesc_empl_count
FROM
hr_dataset
GROUP BY
racedesc
),
abs_recruit_costs_per_racedesc AS --абсолютные затраты на наём персонала в разрезе расово-этнической принадлжености
(SELECT
racedesc,
ROUND(SUM(empl_count * empl_cost),0)::INTEGER AS total_recruting_costs
FROM
employee_selection --кол-во сотрудников по расово-этнической принадлжености внутри источника найма
LEFT JOIN
costs_per_employee --по каждому источнику удельные затраты на наём одного сотрудника
USING(empl_source)
GROUP BY
racedesc
ORDER BY
total_recruting_costs DESC
)
--итог: по каждому полу: кол-во сотрудников, абсолютные затраты на наём, средние затраты на наём 1 человека
(SELECT
racedesc,
racedesc_empl_count AS "Employee Count",
total_recruting_costs,
total_recruting_costs / racedesc_empl_count AS avg_recruiting_cost
FROM
abs_recruit_costs_per_racedesc --абсолютные затраты на наём персонала в разрезе расово-этнической принадлжености
LEFT JOIN
racedesc_selection --кол-во сотрудников по расово-этнической принадлжености
USING(racedesc)
ORDER BY
total_recruting_costs DESC
)
UNION ALL
(SELECT --итоговая суммирующая строка
'[TOTAL]' AS racedesc,
SUM(racedesc_empl_count)::INTEGER AS "Employee Count",
SUM(total_recruting_costs) AS total_recruting_costs,
(SUM(total_recruting_costs) / SUM(racedesc_empl_count))::INTEGER AS avg_recruiting_cost
FROM
abs_recruit_costs_per_racedesc
LEFT JOIN
racedesc_selection
USING(racedesc)
)
;
"""
df_recruit_costs_per_racedesc = pd.read_sql(sql_quiery, conn)
df_recruit_costs_per_racedesc
| racedesc | Employee Count | total_recruting_costs | avg_recruiting_cost | |
|---|---|---|---|---|
| 0 | White | 193 | 50521 | 261 |
| 1 | Black or African American | 57 | 16158 | 283 |
| 2 | Asian | 34 | 9998 | 294 |
| 3 | Two or more races | 18 | 5588 | 310 |
| 4 | American Indian or Alaska Native | 4 | 1159 | 289 |
| 5 | Hispanic | 4 | 1037 | 259 |
| 6 | [TOTAL] | 310 | 84461 | 272 |
ВЫВОД
sql_quiery = \
"""
-- Используем временное представление, созданное ранее: по каждому источнику удельные затраты на наём одного сотрудника
WITH
employee_selection AS --выборка: кол-во сотрудников по семейному положению внутри источника найма
(SELECT
COUNT("Employee Number") AS empl_count,
"Employee Source" AS empl_source,
maritaldesc
FROM
hr_dataset
GROUP BY
"Employee Source",
maritaldesc
ORDER BY
maritaldesc
),
maritaldesc_selection AS --выборка: кол-во сотрудников по семейному положению
(SELECT
maritaldesc,
COUNT("Employee Number") AS maritaldesc_empl_count
FROM
hr_dataset
GROUP BY
maritaldesc
),
abs_recruit_costs_per_maritaldesc AS --абсолютные затраты на наём персонала в разрезе семейному положению
(SELECT
maritaldesc,
ROUND(SUM(empl_count * empl_cost),0)::INTEGER AS total_recruting_costs
FROM
employee_selection --кол-во сотрудников по семейному положению внутри источника найма
LEFT JOIN
costs_per_employee --по каждому источнику удельные затраты на наём одного сотрудника
USING(empl_source)
GROUP BY
maritaldesc
ORDER BY
total_recruting_costs DESC
)
--итог: по каждому полу: кол-во сотрудников, абсолютные затраты на наём, средние затраты на наём 1 человека
(SELECT
maritaldesc,
maritaldesc_empl_count AS "Employee Count",
total_recruting_costs,
total_recruting_costs / maritaldesc_empl_count AS avg_recruiting_cost
FROM
abs_recruit_costs_per_maritaldesc --абсолютные затраты на наём персонала в разрезе семейному положению
LEFT JOIN
maritaldesc_selection --кол-во сотрудников по семейному положению
USING(maritaldesc)
ORDER BY
total_recruting_costs DESC
)
UNION ALL
(SELECT --итоговая суммирующая строка
'[TOTAL]' AS maritaldesc,
SUM(maritaldesc_empl_count)::INTEGER AS "Employee Count",
SUM(total_recruting_costs) AS total_recruting_costs,
(SUM(total_recruting_costs) / SUM(maritaldesc_empl_count))::INTEGER AS avg_recruiting_cost
FROM
abs_recruit_costs_per_maritaldesc
LEFT JOIN
maritaldesc_selection
USING(maritaldesc)
)
;
"""
df_recruit_costs_per_maritaldesc = pd.read_sql(sql_quiery, conn)
df_recruit_costs_per_maritaldesc
| maritaldesc | Employee Count | total_recruting_costs | avg_recruiting_cost | |
|---|---|---|---|---|
| 0 | Married | 123 | 36809 | 299 |
| 1 | Single | 137 | 32915 | 240 |
| 2 | Divorced | 30 | 7511 | 250 |
| 3 | Separated | 12 | 4245 | 353 |
| 4 | Widowed | 8 | 2980 | 372 |
| 5 | [TOTAL] | 310 | 84460 | 272 |
ВЫВОД
# Создадим DF с социальными и демографическими показателями в зависимости от возраста.
sql_quiery = \
"""
SELECT
COUNT("Employee Number") AS empl_count,
sex,
age
FROM
hr_dataset
WHERE
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
GROUP BY
sex,
age
ORDER BY
sex DESC,
age
;
"""
dfg_sex_over_age = pd.read_sql(sql_quiery, conn)
#dfg_sex_over_age
sql_quiery = \
"""
SELECT
COUNT("Employee Number") AS empl_count,
racedesc,
age
FROM
hr_dataset
WHERE
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
GROUP BY
racedesc,
age
ORDER BY
racedesc,
age
;
"""
dfg_race_over_age = pd.read_sql(sql_quiery, conn)
#dfg_race_over_age
sql_quiery = \
"""
SELECT
COUNT("Employee Number") AS empl_count,
maritaldesc,
age
FROM
hr_dataset
WHERE
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
GROUP BY
maritaldesc,
age
ORDER BY
age
;
"""
dfg_marital_over_age = pd.read_sql(sql_quiery, conn)
#dfg_marital_over_age
# Зададим поле графика
fig, axes = plt.subplots(figsize=(18, 12), ncols=1, nrows=3)
# Создадим словарь для создания графиков в цикле
param_dict={0:[dfg_sex_over_age, 'sex', 'Pastel1'], 1:[dfg_race_over_age, 'racedesc', 'Dark2'],
2:[dfg_marital_over_age, 'maritaldesc', 'tab10']}
# Определяем шкалу X:
scale_step = 5 # Определяем шаг шкалы X
# Определяем основной диапазон шкалы
scale_span = list(range((int(dfg_sex_over_age.age.min() / scale_step) * scale_step),
int(dfg_sex_over_age.age.max() + scale_step),
scale_step))
# Построим графики в цикле
for i in range(3):
sns.lineplot(data=param_dict[i][0],
x='age',
y='empl_count',
hue=param_dict[i][1],
marker='o',
linewidth=3,
palette=param_dict[i][2],
alpha=0.7,
ax=axes[i])
axes[i].set_xlim(scale_span[0],scale_span[-1]) # установим границы шкалы X и размер её обозначений
plt.xticks(ticks=scale_span, fontsize=12, rotation=0) # установим шкалу X и размер её обозначений
axes[i].set_xlabel("Age", fontsize=14)
axes[i].set_ylabel("Number of Employees", fontsize=14)
# Определим расстояния между графиками
fig.tight_layout(h_pad=4)
# Определим заголовок графика
fig.suptitle(
"Зависимость пола, расово-этнической принадлежности и семейного положения от возраста действующих сотрудников",
fontsize=16, x=0.50, y=1.05)
plt.show()
ВЫВОД
Половой состав сотрудников
Расово-этнических состав сотрудников
Семейное положение сотрудников
sql_quiery = \
"""
SELECT
sex,
racedesc,
maritaldesc
FROM
hr_dataset
WHERE
"Employment Status" = 'Active' OR
"Employment Status" = 'Future Start' OR
"Employment Status" = 'Leave of Absence'
;
"""
dfg_social_over_social = pd.read_sql(sql_quiery, conn)
#dfg_social_over_social
# Создадим сетку графиков зависимостей количества сотрудников от пола, расово-этнической принадлежности
# и семейного положения
# 1я группа графиков x="racedesc", hue="maritaldesc":
g=sns.catplot(data=dfg_social_over_social,
x="racedesc",
hue="maritaldesc",
col="sex",
kind='count',
ci=False,
palette="Dark2"
)
# Определим подписи осей
g.set_xlabels("Race", fontsize=12)
g.set_ylabels("Number of Employees", fontsize=12)
# Определим подписи шкалы X
g.set_xticklabels(rotation=60,fontsize=10)
# 2я группа графиков x="maritaldesc", hue="racedesc":
g1=sns.catplot(data=dfg_social_over_social,
x="maritaldesc",
hue="racedesc",
col="sex",
kind='count',
ci=False,
palette="tab10"
)
# Определим подписи осей
g1.set_xlabels("Race", fontsize=12)
g1.set_ylabels("Number of Employees", fontsize=12)
# Определим подписи шкалы X
g1.set_xticklabels(rotation=60,fontsize=10)
# Определим заголовок графиков
g.fig.suptitle(
"Распределение числа сотрудников по полу, расово-этнической принадлежности и семейному положению",
fontsize=16, x=0.50, y=1.05)
plt.show()
ВЫВОД
Сотрудникам HR необходимо обратить свое внимание на следующие проблемы и вопросы.
7.1. Зависимости для заработной платы.
# Закроем открытые ранее временные представления
sql_statement = \
"""
DROP VIEW IF EXISTS
EmployeesNameTrim,
EmployeesMissingSpaces,
EmployeesReducedSpaces,
ManagersNorm,
EmployeesAndManagers,
Level1,
Level2,
Level3,
Level4,
PositionsWithMultActiveEmployees,
salary_age_schedule,
EmployeesSourcePayrate,
EmployeesPerformancePayrate,
PerformancePerSex,
PerformancePerMarital,
PerformancePerRace,
PerformancePerDepartment,
PerformancePerChief,
PerformancePerSource,
kpi_s,
costs_per_employee
"""
conn.execute(sql_statement)
<sqlalchemy.engine.cursor.LegacyCursorResult at 0x1a28d0e5730>
pd.set_option('display.max_rows', 60) # восстановим значение по умолчанию максимума отображаемых строк
pd.set_option('display.max_columns', 20) # восстановим значение по умолчанию максимума отображаемых столбцов
Перед вами стоит задача – подготовить аналитический ответ для SMM-отдела компании Skillbox.
Объектом анализа является паблик Skillbox Вконтакте.
Подключитесь к API VK и выгрузите посты со стены паблика Skillbox за интересующий период (определите самостоятельно и обоснуйте). Проанализируйте влияние различных факторов (например, времени публикации) на вовлеченность пользователей (количество лайков, комментариев, голосов в опросах). Сделайте аналитику по рубрикам (примеры рубрик: дизайн-битва, игра по управлению), которые есть в паблике. Выбрать нужные посты можно с помощью регулярных выражений. Составьте перечень рекомандаций для SMM-отдела по итогам анализа.
Дополнительные инструкции по работе с API VK расположены здесь.
### YOUR CODE HERE ###